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Preface 


ASIACRYPT 2012, the 18th International Conference on Theory and Appli- 
cation of Cryptology and Information Security, was held during December 2-6 
in Beijing International Convention Center, Beijing, China. The conference was 
sponsored by the International Association for Cryptologic Research (IACR) in 
cooperation with the Chinese Association for Cryptologic Research (CACR) . It 
was also co-sponsored by the National Natural Science Foundation of China, 
Huawei Technologies Co. Ltd., and Intel Corporation. 

From 241 valid submissions, 43 were accepted for publication after a very 
tough evaluation process. The Program Committee (PC) with the help of 256 
external reviewers provided at least three independent reviews for each paper, 
and five or more for those with PC contributions. 

There were also two invited talks. On Monday, Dan Boneh delivered “Pairing- 
based Cryptography: Past, Present, and Future” as the IACR Distinguished Lec- 
ture. On Wednesday, Chuanming Zong spoke on “Some Mathematical Mysteries 
in Lattices.” In addition to the invited talks, the conference also held a Rump 
Session, full of academic opinions and enjoyment. 

We selected a particularly large and broad PC and encouraged members to fo- 
cus on the positive aspects of submissions. During the one-and-a-half-month-long 
independent review phase, each PC member had about 28 submissions to review, 
our PC members and the external reviewers worked very hard and efficiently. 
In the following one-month daily discussion phase, PC members communicated 
each other’s opinion on the board. We processed the anonymized questions from 
the PC members to authors, which resulted in a better quality of review. 

We would like to thank the authors of all 241 submissions. Their contributions 
made this conference possible. We are extremely grateful to the PC members for 
their enormous investment of time and effort in the difficult and delicate process 
of review and selection, especially given the last decision days were in the midst 
of summer vacation time. A list of PC members and external reviewers can be 
found on the succeeding pages of this volume. We would like to thank Xuejia 
Lai, Zhijun Qiang, Hao Chen, Juan Liu, Dongdai Lin, Bao Li, Meiqin Wang and 
Jialin Huang for the conference organization. Special thanks go to Shai Halevi 
for providing and setting up the splendid review software. We are most grateful 
to Yue Sun, who provided technical support for the entire ASIACRYPT 2012 
review process. We are also grateful to Dong Hoon Lee, the ASIACRYPT 2011 
Program Chair, for his timely information and replies to the host of questions 
we posed during the process. 


September 2012 


Xiaoyun Wang 
Kazue Sako 
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Pairing-Based Cryptography: 
Past, Present, and Future 


Dan Boneh* 

Stanford University 
dabo@cs . Stanford. edu 


Abstract. While pairings were first introduced in cryptography as a tool to attack 
the discrete-log problem on certain elliptic curves, they have since found numer- 
ous applications in the construction of cryptographic systems. To this day many 
problems can only be solved using pairings. A few examples include collusion- 
resistant broadcast encryption and traitor tracing with short keys, 3-way Diffie- 
Hellman, and short signatures. 

In this talk we survey some of the existing applications of pairings to cryptog- 
raphy, but mostly focus on open problems that cannot currently be solved using 
pairings. In particular we explain where the current techniques fail and outline a 
few potential directions for future progress. 

One of the central applications of pairings is identity-based encryption and its 
generalization to functional encryption. While identity-based encryption can be 
built using arithmetic modulo composites and using lattices, constructions based 
on pairings currently provide the most expressive functional encryption systems. 
Constructing comparable functional encryption systems from lattices and com- 
posite arithmetic is a wonderful open problem. Again we survey the state of the 
art and outline a few potential directions for further progress. 

Going beyond pairings (a.k.a bi-linear maps), a central open problem in public- 
key cryptography is constructing a secure tri-linear or more generally a secure 
n- linear map. That is, construct groups G and Gt where discrete-log in G is in- 
tractable and yet there is an efficiently computable non-degenerate n - linear map 
e : G n — ¥ G t . Such a construct can lead to powerful solutions to the problems 
mentioned in the first paragraph as well as to new functional encryption and ho- 
momorphic encryption systems. Currently, no such construct is known and we 
hope this talk will encourage further research on this problem. 


* Supported by NSF, DARPA, AFOSR, Google, and Samsung. 
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Some Mathematical Mysteries in Lattices 


Chuanming Zong 
Peking University 


Lattice, as a basic object in Mathematics, has been studied by many promi- 
nent figures, including Gauss, Hermite, Voronio, Minkowski, Davenport, Hlawka, 
Rogers and many others still active today. It is one of the most important cor- 
nerstones of Geometry of Numbers, a classic branch of Number Theory. During 
recent decades, this pure mathematical concept has achieved remarkable applica- 
tions in Cryptography, in particular its algorithm approaches. The main purpose 
of this talk is to demonstrate some basic mathematical problems and results (old 
and new) about lattices, which are probably useful in Cryptography in the fu- 
ture. These problems reflect some of the main interests of the mathematicians 
about lattices. 

Before Minkowski, lattices were mainly studied through positive definitive 
quadratic forms. In fact, to determine the minimal value of a positive definitive 
quadratic form at integer points is equivalent to determine the length of the 
shortest vectors (except o) of a lattice, which is also equivalent to determine the 
maximal density of the corresponding lattice ball packings. 

It was Minkowski who first studied the density 8*{C) of the densest lattice 
packings of a given centrally symmetric convex body C. In particular, he ob- 
tained the first general lower bound of 8*(C) for n- dimensional unit ball B. In 
fact, to determine the density 5*(C) is to estimate the maximal length of the 
shortest vectors of the lattices of determinant 1 with respect to certain met- 
ric determined by C. When C is the unit ball, the metric is just the ordinary 
Euclidean metric. Therefore, the shortest vector problem is a particular case of 
the study about 8*(B). There are lower bound and upper bound for 8*(C) and 
S*(B), however the asymptotic orders of both min <5* (C) and 8*(B) are unknown. 
For lattice kissing numbers we are facing the similar situation. 

The density 9*(C) of the thinnest lattice covering of a centrally symmetric 
convex body C was first systematically studied by Rogers. In fact, it is equiva- 
lent to determine the minimal length of the longest distance from a point to the 
lattices of determinant 1 with respect to the metric determined by C. Therefore, 
the closest vector problem is a particular case of the study of 0*(B). For par- 
ticular object C, such as a ball in a given dimension, little is known about the 
exact value of 6*(C). 

Let 7 * (C) be the smallest number that there is a lattice A such that C + A is 
a packing and 7 *{C)C + A is a covering. Equivalently, in every lattice packing 
G + A there is a hole in which one can put a translate of ( 7 *{C) — 1 )C. In 
1950, Rogers introduced and studied this number, in particular for the unit ball. 
In fact, 7 *{C) is a bridge connecting 8* (C) and 0* (C). In other words, it is a 
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bridge connecting the packing radius and the covering radius of a lattice, with 
respect to the metric determined by C. Some results about 7*(C) and 7 *(B) are 
known. At the same time, a number of fascinating mysteries about 7*(C'j and 
their possible consequences remain unsolved. 

Can you imagine that, in every three-dimensional lattice ball packing there is 
a straight line of infinite length which does not meet any of the balls; when n 
is large, in every n-dimensional lattice ball packing there is a free hyperplane of 
dimension more or less n/logn? But, this is true! 



Constant-Size Structure-Preserving Signatures: 
Generic Constructions and Simple Assumptions 


Masayuki Abe 1 , Melissa Chase 2 , Bernardo David 3 , 
Markulf Kohlweiss 2 , Ryo Nishimaki 1 , and Miyako Ohkubo 4 

1 NTT Secure Platform Laboratories 
{abe . masayuki , nishimaki . ryojSlab .ntt.co.jp 
2 Microsoft Research 

{melissac , markulf }@microsoft . com 
3 University of Brasilia 
bernardo . david@aluno . unb . br 
4 Security Architecture Laboratory, NSRI, NICT 
m. ohkubo@nict .go.jp 


Abstract. This paper presents efficient structure-preserving signature schemes 
based on assumptions as simple as Decisional-Linear. We first give two general 
frameworks for constructing fully secure signature schemes from weaker build- 
ing blocks such as variations of one-time signatures and random-message secure 
signatures. They can be seen as refinements of the Even-Goldreich-Micali frame- 
work, and preserve many desirable properties of the underlying schemes such as 
constant signature size and structure preservation. We then instantiate them based 
on simple (i.e., not q-type) assumptions over symmetric and asymmetric bilinear 
groups. The resulting schemes are structure-preserving and yield constant-size 
signatures consisting of 11 to 17 group elements, which compares favorably to 
existing schemes relying on q-type assumptions for then security. 

Keywords: Structure-preserving signatures, One-time signatures, Groth-Sahai 
proof system, Random message attacks. 


1 Introduction 

A structure-preserving signature (SPS) scheme m is a digital signature scheme with 
two structural properties (i) the verification keys, messages, and signatures are all el- 
ements of a bilinear group; and (ii) the verification algorithm checks a conjunction of 
pairing product equations over the key, the message and the signature. This makes them 
compatible with the efficient non-interactive proof system for pairing-product equations 
by Groth and Sahai (GS) Oil . Structure-preserving cryptographic primitives promise 
to combine the advantages of optimized number theoretic non-blackbox constructions 
with the modularity and insight of protocols that use only generic cryptographic build- 
ing blocks. 

Indeed the instantiation of known generic constructions with a SPS scheme and the 
GS proof system has led to many new and more efficient schemes: Groth El showed 
how to construct an efficient simulation- sound zero-knowledge proof system (ss-NIZK) 
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building on generic constructions of II 1713 913 4B . Abe et al. 0 show how to obtain effi- 
cient round-optimal blind signatures by instantiating a framework by Fischlin EB.SPS 
are also important building blocks for a wide range of cryptographic functionalities such 
as anonymous proxy signatures El . delegatable anonymous credentials B. transfer- 
able e-cash E51 and compact verifiable shuffles mi Most recently, m show how to 
construct a structure preserving tree-based signature scheme with a tight security reduc- 
tion following the approach of 1261 1 81 . This signature scheme is then used to build a 
ss-NIZK which in turn is used with the Naor-Yung-Sahai 113 513 811 paradigm to build the 
first CCA secure public-key encryption scheme with a tight security reduction. Exam- 
ples for other schemes that benefit from efficient SPS are 171 1 1IXI32I27I5I3 712412 112811 . 

Because properties (i) and (ii) are the only dependencies on the SPS scheme made by 
these constructions, any structure-preserving signature scheme can be used as a drop-in 
replacement. Unfortunately, all known efficient instantiations of SPS 1141 1 1211 are based 
on so-called g-type or interactive assumptions that are primarily justified based on the 
Generic Group model. An open question since Groth’s seminal work Ell (only partially 
answered by El) is to construct a SPS scheme that is both efficient - in particular 
constant-size in the number of signed group elements - and that is based on assumptions 
that are as weak as those required by the GS proof system itself. 

Our contribution. Our first contribution consists of two generic constructions for cho- 
sen message attack (CMA) secure signatures that combine variations of one-time sig- 
natures and signatures secure against random message attacks (RMA). Both construc- 
tions inherit the structure-preserving and constant-size properties from the underlying 
components. The second contribution consists in the concrete instantiations of these 
components which result in constant-size structure-preserving signature schemes that 
produce signatures consisting of only 1 1 to 17 group elements and that rely only on ba- 
sic assumptions such as Decisional-Linear (DLIN) for symmetric bilinear groups and 
analogues of DDH and DLIN for asymmetric bilinear groups. To our knowledge, these 
are the first constant-size structure-preserving signature schemes that eliminate the use 
of g-type assumptions while achieving reasonable efficiency. 

We instantiate the first generic construction for symmetric (Type-I) and the second 
for asymmetric (Type-Ill) pairing groups. See Table Q] in Section 0 for the summary of 
efficiency of the resulting schemes. We give more details on our generic constructions 
and their instantiations: 

- The first generic construction (SIG1) combines a new variation of one-time sig- 
natures which we call tagged one-time signatures and signatures secure against 
random message attacks (RMA). A tagged one-time signature scheme, denoted by 
TOS, is a signature scheme that attaches a fresh tag to a signature. It is unforge- 
able with respect to tags that are used only once. In our construction, a message is 
signed with our TOS scheme using a fresh random tag, and then the tag is signed 
with the second signature scheme, denoted by rSIG. Since the rSIG scheme only 
signs random tags, RMA-security is sufficient. 

- The second generic construction (SIG2) combines partial one-time signatures and 
signatures secure against extended random message attacks (XRMA). The latter is 
a novel notion that we explain below. Partial one-time signatures, denoted by POS, 
are one-time signatures for which only a part of the one-time key is renewed for 
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every signing operation. They were first introduced by Bellare and Shoup 0 under 
the name of two-tier signatures. In our construction, a message is signed with the 
POS scheme and then the random one-time public-key is certified by the second 
signature scheme, denoted by xSIG. The difference between a TOS scheme and 
a POS scheme is that a one-time public-key is associated with a one-time secret- 
key. Since the one-time secret-key is needed for signing, it must be known to the 
reduction in the security proof. XRMA-security guarantees that xSIG is unforgeable 
even if the adversary is given auxiliary information associated with the randomly 
chosen messages (it is a random coin used for selecting the message). The auxiliary 
information facilitates access to the one-time secret-key by the reduction. 

- To instantiate SIG1, we construct structure-preserving TOS and rSIG signature 
schemes based on DLIN over Type-I bilinear groups. Our TOS scheme yields 
constant-size signatures and tags. The resulting SIG1 scheme is structure-preserving, 
produces signatures consisting of 17 group elements, and relies solely on the DLIN 
assumption. 

- To instantiate SIG2, we construct structure-preserving POS and xSIG signature 
schemes based on assumptions that are analogues of DDH and DLIN in Type-Ill 
bilinear groups. The resulting SIG2 scheme is structure-preserving, produces sig- 
natures consisting of 1 1 group elements for uniliteral messages in a base group or 
14 group elements for biliteral messages from both base groups. 

The role of partial one-time signatures is to compress a message into a constant number 
of random group elements. This observation is interesting in light of 0 that implies 
the impossibility of constructing collision resistant and shrinking structure-preserving 
hash functions, which could immediately yield constant-size signatures. Our (extended) 
RMA-secure signature schemes are structure-preserving variants of Waters’ 
dual- signature scheme lETTl . In general, the difficulty of constructing CMA-secure SPS 
arises from the fact that the exponents of the group elements chosen by the adversary as 
a message are not known to the reduction in the security proof. On the other hand, for 
RMA security, it is the challenger that chooses the message and therefore the exponents 
can be known in reductions. This is the crucial advantage for constructing (extended) 
RMA-secure structure-preserving signature schemes based on Waters’ dual- signature 
scheme. 

Finally, we mention a few new applications. Among these is the achievement of 
a drastic performance improvement when using our partial one-time signatures in the 
work by Hofheinz and Jager ED to construct CCA-secure public-key encryption 
schemes with a proof of security that tightly reduces to DLIN or SXDH. 

Related Works. Even, Goldreich and Micali proposed a generic framework (the 
EGM framework) that combines a one-time signature scheme and a signature scheme 
that is secure against non-adaptive chosen message attacks (NACMA) to construct a 
signature scheme that is secure against adaptive chosen message attacks (CMA). 

In fact, our generic constructions can be seen as refinements of the EGM framework. 
There are two reasons why the original framework falls short for our purpose. The first 
is that relaxing to NACMA does not seem a big help in constructing efficient structure- 
preserving signatures since the messages are still under the control of the adversary and 
the exponents of the messages are not known to the reduction algorithm in the security 
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proof. As mentioned above, resorting to (extended) RMA is a great help in this regard. 
In lH91 . they also showed that CMA-secure signatures exist iff RMA-secure signatures 
exist. The proof, however, does not follow their framework and their impractical con- 
struction is mainly a feasibility result. In fact, we argue that RMA-security alone is 
not sufficient for the original EGM framework. As mentioned above, the necessity of 
XRMA security arises in the reduction that uses RMA-security to argue security of the 
ordinary signature scheme, as the reduction not only needs to know the random one- 
time public-keys, but also their corresponding one-time secret keys in order to generate 
the one-time signature components of the signatures. The auxiliary information in the 
XRMA definition facilitates access to these secret keys. Similarly, tagged one-time sig- 
natures avoid this problem as tags do not have associated secret values. The second 
reason that the EGM approach is not quite suited to our task is that the EGM frame- 
work produces signatures that are linear in the public-key size of the one-time signature 
scheme. Here, tagged or partial one-time signature schemes come in handy as they al- 
low the signature size to be only linear in the size of the part of the public key that is 
updated. Thus, to obtain constant-size signatures, we require the one-time part to be 
constant-size. 

Hofheinz and Jager EO constructed a SPS scheme by following the EGM 
framework. The resulting scheme allows tight security reduction to DLIN but the size of 
signatures depends logarithmically to the number of signing operation as their NACMA- 
secure scheme is tree-based like the Goldwasser-Micali-Rivest signature scheme lEJ. 
Chase and Kohlweiss ca and Camenisch, Dubovitskaya, and Haralambiev hj con- 
structed SPS schemes with security based on DLIN that improve the performance of 
Groth’s scheme E3 by several orders of magnitude. The size of the resulting signa- 
tures, however, are still linear in the number of signed group elements, and an order 
of magnitude larger than in our constructions. Camenisch, Dubovitskaya, and Har- 
alambiev constructed a constant-size SPS scheme based on simple assumptions over 
composite-order groups ifHl . 

Full Version. In this extended abstract, we do not have enough space to write complete 
proofs, so we omitted them. Please see a full version on Cryptology ePrint Archive 
(2012/285). 

2 Preliminaries 

Notation. Appending element y to a sequence X = (xi, ... , x n ) is denoted by (X, y), 
i.e., ( X , y) = (2:1, . . . , x n , y). When algorithm A is defined for input x and output y, 
notation y <— A{x) for x := {xi , . . . , x n } means that y t A(xi) is executed for 
i =m 1 ,...,n and y is set as y := (yi, . . . , y n ). For set X, notation a X denote 
a uniform sampling from X. Independent multiple sampling from the same set X is 
denoted by a, b, c , .. «— X. 

Bilinear groups. Let Q be a bilinear group generator that takes security parameter 1 A 
and outputs a description of bilinear groups A := (p, Gi, G2, G t, e), where Gi, G2 
and Gt are groups of prime order p, and e is an efficient and non-degenerating bilinear 
map Gi X G 2 — > Gt ■ Following the terminology in ESI this is a Type-Ill pairing. In 
the Type-III setting Gi G 2 and there are no efficient mapping between the groups in 
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either direction. In the Type-Ill setting, we often use twin group elements, ( G a , G a ) G 
Gi x G>2 for some bases G and G. For X in Gi , notation X denotes for an element 
in Gy that log X = log X where logarithms are with respect to default bases that are 
uniformly chosen once for all and implicitly associated to A. Should their relation be 
explicitly stated, we write X ~ X. We count the number of group elements to measure 
the size of cryptographic objects such as keys, messages, and signatures. For Type-III 
groups, we denote the size by (a;, y) when it consists of x and y elements from Gi and 
G2, respectively. We refer to the Type-I setting when Gi = G2 (i.e., there are efficient 
mappings in both directions). This is also called the symmetric setting. In this case, we 
define A := (p. G, G t, e). When we need to be specific, the group description yielded 
by Q will be written as /l asym and A sym . 

Assumptions. We first define computational and decisional Diffie-Hellman assumptions 
(CDHi, DDHi) and decisional linear assumption (DLINi) for Type-III bilinear groups. 
Corresponding more standard assumptions, CDH, DDH, and DLIN, in Type-I groups 
are obtained by setting Gi = G2 and G = G in the respective definitions. 

Definition 1 (Computation co-Diffie-Hellman Assumption: CDHi) 

The CDHi assumption holds if, for any p.p.t. algorithm A, the probability Advg°'_^ dh 
(A) := Pr [Z = G xy \ A <- Q{l x )-x,y t- Z P ;Z «- A(A, G, G x , G y , G, G x , G*)] is 
negligible in A. 

Definition 2 (Decisional Diffie-Hellman Assumption in Gi: DDHi) 

Given A 4— Q( 1 A ), G <- G*, (G X ,G», Z b ) G Gi 3 where Z x = G x+y , Z 0 4— Gi for 
random x and y, any p.p.t. algorithm A decides whether b = 1 or 0 only with advantage 
Adv^J 1 (A) that is negligible in A. 

Definition 3 (Decisional Linear Assumption in Gi: DLINi) 

Given A «- G(l x ), (Gi,G 2 ,G 3 ) 4— G f and (Gf,G%,Z b ) where Z x = G% +y and 
Zq = G3 for random x,y,z G Z p , any p.p.t. algorithm A decides whether b = 1 or 0 
only with advantage Advg 1 ’^ 1 (A) that is negligible in A. 

For DDHi and DLINi, we define an analogous assumption in G2 (DDH 2 ) by swap- 
ping Gi and G2 in the respective definitions. In Type-III bilinear groups, it is assumed 
that both DDHi and DDH 2 hold simultaneously. The assumption is called the sym- 
metric external Diffie-Hellman assumption (SXDH), and we define advantage Adv^ 1 ^ 1 
by Advg d c h (A) := Advg> d J\ 1 (A) + Advg d g 2 (A). We extend DLIN in a similar manner as 
DDH, and SXDH. 

Definition 4 (External Decision Linear Assumption in Gi: XDLINi) 

Given A «- Q(l x ), (G U G 2 , G 3 ) 4- G* 3 and (Gf,G%, G 1 ,G 2 ,G 3 , G : f, G\, Z b ) where 
(Gi,G 2 ,G 3 ) ~ {Gi,G 2 ,G 3 ), Z\ = G% +y , and Z 0 = Gf for random x,y, z G 
any p.p.t. algorithm A decides whether b = 1 or 0 only with advantage Adv^ d ^ (A) that 
is negligible in A. 

The XDLINi assumption is equivalent to the DLINi assumption in the generic bilinear 
group model 11401101 where one can simulate the extra elements, Gi, G2, G 3 , G x , G y , 
in XDLINi from Gi, G 2 , G 3 , Gf , G\ in DLINi. We define the XDLIN 2 assumption 
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analogously by giving G$ +y or as Zb, to A instead. Then we define the simultaneous 
external DLIN assumption, SXDLIN, that assumes that both XDLINi and XDLIN 2 
hold at the same time. By Advg d ^ 2 (Advg“^ n , resp.), we denote the advantage function 
for XDLIN 2 (and SXDLIN, resp.). 

Definition 5 (Double Pairing Assumption in Gi [4]:DBPi) 

Given A «— Q{l x ) and (G z . G r ) t— G* 2 , any p.p.t. algorithm A outputs ( Z , R) e Gj 2 
that satisfies 1 = e{G z , Z) e(G r , R ) only with probability Adv^ 1 (A) that is negligible 
inX. 

The double pairing assumption in G 2 (DBP 2 ) is defined in the same manner by swap- 
ping Gi and G 2 . It is known that DBPi (DBP 2 , resp.) is implied by DDHi (DDH 2 , 
resp.) and the reduction is tight 0. Note that the double pairing assumption does not 
hold in Type-I groups since Z = G r , R = Gf l is a trivial solution. The following 
analogous assumption will be useful in Type-I groups. 

Definition 6 (Simultaneous Double Pairing Assumption SDP) 

Given A t— Q( 1 A ) and (G z ,G r , H z , H s ) <— G* 4 , any p.p.t. algorithm A outputs 
(. Z,R,S ) e G* 3 that satisfies 1 = e(G z ,Z) e(G r , R) A 1 = e(H z ,Z) e{H s ,S) 
only with probability Adv^ P A (A) that is negligible in A. 

As shown in IfTH for the Type-I setting, the simultaneous double pairing assumption 
holds for Q if the decisional linear assumption holds for Q . 

3 Definitions 

Common setup. All building blocks make use of a common setup algorithm Setup that 
takes the security parameter 1 A and outputs a global parameters gk that is given to all 
other algorithms. Usually gk consists of a description A of a bilinear group setup and a 
default generator for each group. In this paper, we include several additional generators 
in gk for technical reasons. Note that when the resulting signature scheme is used in 
multi-user applications different additional generators need to be assigned to individual 
users or one needs to fall back on the common reference string model, whereas A and 
the default generators can be shared. Thus we count the size of gk when we assess the 
efficiency of concrete instantiations. For ease of notation, we make gk implicit except 
w.r.t. key generation algorithms. 

Signature schemes. We use the following syntax for signature schemes suitable for the 
multi-user and multi-algorithm setting. The key generation function takes global param- 
eter gk generated by Setup (usually it takes security parameter 1 A ), and the message 
space M. is determined solely from gk (usually it is determined from a public-key). 

Definition? (Signature Scheme). A signature scheme SIG is a tuple of three 
polynomial-time algorithms (Key, Sign, Vrf) that; 

— S\G.Key(gk) generates a long-term public-key vk and a secret-key sk. 

— SIG.Sign(s/c, msg) takes sk and message msg and outputs signature a. 

— SIG.Vrf (u/c, msg, a) outputs 1 for acceptance or 0 for rejection. 
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Correctness requires that 1 = SIG.Vrf (vk,msg,o) holds for any gk generated by 
Setup, any keys generated as ( vk , sk ) <— SIG.Key(gfc), any message msg £ A4, and 
any signature o <— SIG.Sign(s/c, msg). 

Definition 8 (Attack Game(ATK)). Let Osig be an oracle and A be an oracle al- 
gorithm. We define a meta attack game as a sequence of execution of algorithms as 
follows: ATK(A, A) = 

| gk <— Setup(l A ), pre <- A(gk), (vk, sk) t- SIG.Key^fc), (cr t , msg <- A° sl9 (vk ) j 

Adversary A commits to pre, which is typically a set of messages, in the first run. This 
formulation is to capture non-adaptive attacks. It is implicit that a state information 
is passed to the second run of A. Let Q rn be a set of messages, for which A requests 
signatures from its oracle before outputting the resulting forgery. The output of ATK is 
(vk, o\msg^ ,Q m ). 

Definition 9 (Adaptive Chosen-Message Attack (CMA)). Adaptive chosen message 
attack security is defined by the attack game ATK where pre is empty and oracle Osig is 
the signing oracle that, on receiving a message msg, performs o t— SIG.Signf.sfc, msg), 
and returns o. 

Definition 10 (Random Message Attack fRMAl ltlU l. Random message attack se- 
curity is defined by the attack game ATK where pre is empty and oracle Osig is the 
following: on receiving a request, it chooses msg uniformly from A4 defined by gk, 
computes signature o <— SIG.Sign(sfc, msg), and returns (a, msg). 

Let MSGGen be a uniform message generator. It is a probabilistic algorithm that takes 
gk and outputs msg £ M. that distributes uniformly over M. Furthermore, MSGGen 
outputs auxiliary information aux that may give a hint about the random coins used for 
selecting msg. 

Definition 11 (Extended Random Message Attack (XRMA)). Extended random mes- 
sage attack is attack game ATK where pre is empty and oracle Osig is the follow- 
ing. On receiving a request, it runs (msg, aux) <— MSGGen (gk), computes o <— 
SIG.Sign(.sfc, msg), and returns (o, msg, aux). 

Definition 12 (Unforgeahility against ATK). Signature scheme SIG is unforgeable 
against attack ATK (UF-ATK) where ATK £ {CMA, RMA, XRMA}, if for all p.p.t. or- 
acle algorithm A the advantage function Adv^^ := Pr \msg^ £ Q m A 1 = 
SIG.Vrf(ufc, cr\msgi) | (vk, o\msg^ ,Q m ) ATK(A, A)] is negligibel in A. 

Fact 1. UF-CMA => UF-XRMA =» UF-RMA, i.e., Adv^ G c, J(A) > Adv^ a (A) > 
A d v siG™(A)- 

Partial one-time and tagged one-time signatures. Partial one-time signatures, also 
known as two-tier signatures m, are a variation of one-time signatures where only 
part of the public-key must be updated for every signing, while the remaining part can 
be persistent. 
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Definition 13. [Partial One-Time Signature Scheme ] A partial one-time signatures 
scheme POS is a set of polynomial-time algorithms POS. (Key, Update, Sign, Vrf}. 

- POS.Key(gffc) generates a long-term public-key pk and a secret-key sk. The mes- 
sage space A4 a is associated with pk. (Recall that we require that A4„ be com- 
pletely defined by gk.) 

- POS.UpdateQ takes gk as implicit input, and outputs a pair of one-time keys 
( opk , osk). We denote the space for opk by K. op k. 

- POS.Sign(,sfc, msg, osk) outputs a signature <r on message msg based on sk and 
osk. 

- POS. Vrf (pk, opk, msg, o) outputs 1 for acceptance, or 0 for rejection. 

For correctness, it is required that 1 = POS. Vrf (pk, opk, msg, o) holds except for neg- 
ligible probability for any gk, pk, opk, o, and msg £ M. 0 , such that gk <— Setup(l A ), 
(pk, sk)<- POS.Key(gk), (opk, osk) 4- POS. Update(), a 4- POS.Sign(sfc, msg, osk). 

A tagged one-time signature scheme is a signature scheme whose signing function in 
addition to the long-term secret key takes a tag as input. A tag is one-time, i.e., it must 
be different for every signing. 

Definition 14 (Tagged One-Time Signature Scheme). A tagged one-time signature 
scheme TOS is a set of polynomial- time algorithms TOS.{Key, Tag, Sign, Vrf}. 

- TOS.Key(gffc) generates a long-term public-key pk and a secret-key sk. The mes- 
sage space A4 t is associated with pk. 

- TOS.TagQ takes gk as implicit input and outputs tag. By T, we denote the space 
for tag. 

- TOS. Sign(s&, msg, tag) outputs signature o for message msg based on sk and 
tag. 

- TOS. Vrf (pk, tag, msg, a) outputs l for acceptance, or 0 for rejection. 

Correctness requires that 1 = TOS.Vrf(pfc, tag, msg, o) holds except for negligible 
probability for any gk, pk, tag, o, and msg 6 Ait, such that gk 4— Setup(l A ), 
(pk, sk) 4- TOS. Key (gk), tag 4- TOS.TagQ, a 4- TOS. Sign (sk,msg, tag). 

A TOS scheme is POS scheme for which tag = osk = opk. We can thus give a security 
notion for POS schemes that also applies to TOS schemes by reading Update = Tag 
and tag = osk = opk. 


Definition 15 (Unforgeability against One-Time Adapative Chosen-Message At- 
tacks). A partial one-time signature scheme is unforgeable against one-time adaptive 
chosen message attacks (OT-CMA) if for all p.p.t. oracle algorithm A the advantage 
function Advpo c s ™ is negligible in A, where Advpo C s m ^ (A) := 


Pr 


3(opk,msg,o) £ Q m s.t. 

opk * = opk A msg^ yf msg A 

1 = POS.Vrf(p&, opk\o^ , msfifQ 


gk 4- Setup(l A ), 

(pk,sk)^ POS.Key (<?£;), 

( opk\o t ,msg^) 4— A ot ’° Si9 (pk) 


Q m is initially an empty list. Ot is the one-time key generation oracle that on receiving a 
request invokes a fresh session j, performs (opkj, oskj) 4— POS.UpdateQ, and returns 
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opkj. Osig is the signing oracle that, on receiving a message rnsg-j for session j, per- 
forms Oj <— POS.Sign(sA), msgj, oskj), returns Oj to A, and records ( opk j , msg.j , Oj) 
to the list Q m . Osig works only once for every session. Strong unforgeability is defined 
as well by replacing condition msg t f msg with ( msg t, o^) f ( msg , a). 

We define a non-adaptive variant (OT-NACMA) of the above notion by integrating Ot 
into Osig so that opkj and oy are returned to A at the same time. Namely, A must 
submit msgj before seeing opkj. If a scheme is secure in the sense of OT-CMA, the 
scheme is also secure in the sense of OT-NACMA. If a scheme is strongly unforgeable, 
it is unforgeable as well. By AdVpo n s a ^ na (A) we denote the advantage of A in this non- 
adaptive case. For TOS, we use the same notations, OT-CMA and OT-NACMA, and 
define advantage functions Advyos^ and Advyo's^J 13 accordingly. For strong unforge- 
abiltiy, we use label sot-cma and sot-nacma. 

We define a condition that is relevant for coupling random message secure signature 
schemes with partial one-time and tagged one-time signature schemes in later sections. 

Definition 16 (Tag/One-time Public-Key Uniformity). T OS is called uniform-tag if 
TOS. Tag outputs tag that uniformly distributes over tag space T. Similarly, POS is 
called uniform-key if POS. Update outputs opk that uniformly distributes over key space 

ic opk . 

Structure-preserving signatures. A signature scheme is structure-preserving over a bi- 
linear group A, if public-keys, signatures, and messages are all base group elements 
of A, and the verification only evaluates pairing product equations. Similarly, POS 
schemes are structure-preserving if their public-keys, signatures, messages, and tags or 
one-time public-keys consist of base group elements and the verification only evaluates 
pairing product equations. 

4 Generic Constructions 

4.1 SIG1: Combining Tagged One-Time and RMA-Secure Signatures 

Let rSIG be a signature scheme with message space A4 r , and TOS be a tagged one-time 
signature scheme with tag space T such that M r = T. We construct a signature scheme 
SIG1 from rSIG and TOS. Let gk be a global parameter generated by Setup(l A ). 

- SIGl.Key(<jifc): Run ( pk t ,sk t ) <— TOS.Key(gk), (vk r ,sk r ) <- rSIG.Key(<?&). 
Output vk := ( pk t ,vk r ) and sk := ( sk t ,sk r ). 

- SIGl.Sign(s&,rns<jf): Parse sk into ( sk t ,sk r ). Run tag <— TOS.TagQ, o t <— 
TOS.Sign(sfc t , msg, tag), ay <— rSIG.Sign(s£y, tag). Output o := {tag, o t , ay). 

- SIGl.Vrf {vk, o, msg): Parse vk and o accordingly. Output 1, if 1 = T OS.Vrf (pAy , 
tag, o t , msg) and 1 = rSIG.Vrf (u/ty , ay, tag). Output 0, otherwise. 

We prove the above scheme is secure by showing a reduction to the security of each 
component. As our reductions are efficient in their running time, we only relate success 
probabilities. 

Theorem 17. SIG1 is UF-CMA if TOS is uniform-tag and OT-NACMA, and rSIG is 
UF-RMA. In particular, Adv^^fA) < Advyos^ 3 (A) + Adv^sf^c (A). 
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Proof. Any signature that is accepted by the verification algorithm must either reuse an 
existing tag, or sign a new tag. The success probability Adv^g^fA) of an attacker on 
SIG1 is bounded by the sum of the success probabilities Advros eT 3 (A) of an attacker 
on TOS and the success probability Adv^g™ (A) of an attacker on rSIG. 

Game 0: The actual Unforgeability game. Pr[Game 0] — Adv^^^A). 

Game 1: The real security game except that the winning condition is changed to no 
longer accept repetition of tags. 

Lemma 18. | Pr[Game 0] - Pr[Game 1]| < Advyos^ 3 (A) 

Game 2: The fully idealized game. The winning condition is changed to reject all sig- 
natures. 

Lemma 19. | Pr[Game 1] — Pr[Game 2]| < Adv^JT^ (A) 

Thus Advl^^A) = Pr[Game 0] < Adv° t o S a '^ na (A) + Adv^ m ^(A) as claimed. 

Theorem 20. If TOS. Tag produces constant-size tags and signatures in the size of 
input messages, the resulting SIG1 produces constant-size signatures as well. Further- 
more, if TOS and rSIG are structure-preserving, so is SIG1. 

We omit the proof of TheoremlHHas it is done simply by examining the construction. 

4.2 SIG2: Combining Partial One-Time and XRMA-Secure Signatures 

Let xSIG be a signature scheme with message space A4 X , and POS be a partial one- 
time signature scheme with one-time public-key space 1C op k such that M x = JC op k- We 
construct a signature scheme SIG2 from xSIG and POS. Let gk be a global parameter 
generated by Setup(l A ). 

- SIG2.Key(<jifc): Run ( pk p ,sk p ) «— POS.Key(<jifc), ( vk x ,sk x ) «— xSIG.Key(<jifc). 
Output vk := ( pk p ,vk x ) andsfc := (sk p ,sk x ). 

- SIG2.Sign(sfc, msg ): Parse sk into ( sk p , sk x ). Run ( opk , osk) <— POS.Update(), 
o p -i— POS. Sign (sk p , msg, osk), o x t— xSIG.Sign(sfca;, opk). Output it := (opk, 

- SIG2.Vrf(uA;, 

a, msg): Parse vk and o accordingly. Output 1 if 1 = POS.Vrf opk, o p , 
msg), and 1 = xSIG.Vrf (vk x , o x , opk). Output 0, otherwise. 

Theorem 21. SIG2 is UF-CMA if POS is uniform-key and OT-NACMA, and xSIG is 
UF-XRMA w.r.t. POS. Update as the message generator. In particular, Advjf^j^A) < 
Advpos 3 ^ 13 (A) + Adv“siG^c a (A). 

Proof. The proof is almost the same as that for Theorem IT71 The only difference ap- 
pears in constructing C in the second step. Since POS. Update is used as the extended 
random message generator, the pair (msg, aux) is in fact (opk, osk). Given (opk, osk), 
adversary C can run POS.Sign(sfc, msg, osk) to yield legitimate signatures. 

Theorem 22. If POS produces constant-size one-time public-keys and signatures in 
the size of input messages, resulting SIG2 produces constant-size signatures as well. 
Furthermore, if POS and xSIG are structure-preserving, so is SIG2. 
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5 Instantiating SIG1 

We instantiate the building blocks TOS and rSIG of our first generic construction to 
obtain our first SPS scheme. We do so in Type-I bilinear group setting. The resulting 
SIG 1 scheme is an efficient structure-preserving signature scheme based only on the 
DLIN assumption. 

Setup for Type-I groups. The following setup procedure is common for all instantiations 
in this section. The global parameter gk is given to all functions implicitly. 

Setup(l A ): Run/1= (p, G, Gr,e) <- £/(l A ) and pick random generators (G, C, F, U\, 
U 2 )<- G* 5 . Output gk := ( A , G, C, F, U x , U 2 ). 

The parameters gk fix the message space A4 r := { ( C mi , C m2 , F mi , F™ 2 , U™ 1 , U ™ 2 ) 
e G 6 | (mi, m 2 ) G Zp} for the RMA-secure signature scheme defined below. For 
our generic framework to work, the tagged one-time signature schemes should have the 
same tag space. 

Tagged one-time signature scheme. Basically, a tag in our scheme consists of a pair of 
elements in G. However, due to a constraint from rSIG we show in the next section, the 
tags will have to be in an extended form. We therefore parameterize the one-time key 
generation function Update with a flag mode £ {normal, extended} so that it outputs 
a key in the original or extended form. Although mode is given to Update as input, 
it should be considered as a fixed system-wide parameter that is common for every 
invocation of Update and the key space is fixed throughout the use of the scheme. 
Accordingly, this extension does not affect the security model at all. 

TOS.Key(< 7 £;): Parse gk = (A, G, C, F, U\,U 2 ). Pick random x r , y r , x s , y s , x t , yt, x\, 
yi,...,x k ,y k in Zip such that such that x r y s f x s y r and compute G r := G Xr ,H r := 
G y -,G S := G Xs , H s := G Vs , G t := G x \H t := G y ',G 0 := G X °,H 0 := 
G Vo , ... ,Gk := G Xk ,Hk :=G»*. Output pfc := (G r , G s , G t , H r , H s , H t , G 0 , . . . , 
G k ,H 0 , ■ . -,H k ) and sk := (x r ,x s ,x t ,y r ,y s ,yt,x 0 , ...,x k ,yo,.. ■ ,y k ) 
TOS.TagQ: Take generators G,C,F, U\,U 2 from gk. Choose u'i , u > 2 «— Z* and com- 
pute tag := ( C Wl , G W2 , F Wl , F W2 , C/f 1 , f7^ 2 ) . Output tag. 

TOS.Sign(sfc, msg, tag): Parse msg to (Mi, . . . , M k ) and tag to (Ti,T 2 , . . .). Parse 
sk accordingly. Choose random m <— Z p and let value Mo := G" 1 n*=i ' ■ 
(This is uniformly distributed.) Compute A := G~ Xt Tf m fliLo Mf Xi and B := 
G~ Vt Tf m nto M~ Vi . Since x r y s 7 ^ x s y r we can compute ^ ^ = (ylyl ) _1 - 

(The determinant is nonzero.) Compute Z := A a B & and W := A 1 B s . Output 
a :=(Z,W,M 0 ). 

T OS.Vrf (pfc, tag , msg, a): Accept if the following equalities hold: 


e(G r , Z) ■ e(G s , W) ■ e(G t , G ) ]^[ e(GjTi, M*) = 1 
»= 0 

:(H r , Z) ■ e(H s , W) ■ e{H t , G) e(H<T 2 , Mf) = 1 
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We remark that the correctness of the extended tag (T3 , ... ,Tq) is not examined within 
this scheme. (We only need to show that the extended part is simulatable in the security 
proof.) Since the tag is given to SIGr as a message, it is the verification function of SIGr 
that verifies the correctness with respect to its message space, which is the same as the 
tag space. The scheme is obviously structure-preserving and the correctness is easily 
verified by simple calculation. 

Theorem 23. The above TOS scheme is OT-CMA under the SDP. In particular, for 
any A that makes at most q s signing queries, Advy 0 c s m ^(A) < q s ■ AdVgg(A) + 1/p 
holds. 

Proof. We show a reduction algorithm that simulates the one-time adaptive chosen mes- 
sage attack game for the adversary. The reduction gets an instance of the simultaneous 
double pairing assumption, A, G r . G s , H r , H s , and proceeds as follows. 

Setup and Key Generation. It chooses £, rj, p and sets G t := G k G r f and H t := H/H 1 /. 
It chooses G £ G and random u>, v, v \ , z/ 2 , and computes gk = ( A , C, F, U\ , Uf) = 
(A, G 121 ,G U1V ,G UVl ,G UV2 ). It chooses random p i: Uj, t,, computes Gj = Gf Gf Gf = 
GPi+knc^+^i and H z = HfHfHf = H*+e r * for i = 0 ... k, and sets 

pk = (G, G r , G„, G t ,H r , H„H t , G 0 , . . . G k , H (h ..., H k ). (Note that G it Hi are cor- 
rectly distributed and give no information about r,.) It sends pk , gk to the adversary. 
The reduction will pick a random session j*, and assume that the adversary will try to 
reuse tag from that session. 

Queries to oracle Ot. When the adversary makes a query to the tag oracle Ot, choose 
the next new session index j. 

- For session j f j*: Pick random values p,o,r <- Z p . Compute (Ti,T 2 ) = 
(G/C/GJ, HPH/Hf) = (GP+Z T G° +r > T ,HP+Z T Hf+L lT ), and set T = (7i,T 2 , 
Tf, Tf, T/ 1 , Tf 2 ). Store (j, p, a, r), and return T to the adversary. 

- For session j*. Pick random values p,o <— Z p . Compute (Ti,T 2 ) = (G/.G/, 
H/H/). Let T = (Ti, T 2 , Tf, Tf, Tf 1 , Tf 2 ) . Store (j*,p, a), and return T to the 
adversary. 

Queries to oracle Osig. When the adversary queries Osig for message M = (Mi , . . . , 
M k ) € G k and session j, proceed as follows. 

- If the Ot has not yet produced a tag for session j, or Osig has already been queried 
for session j, return _L. 

- For session j f j*: Look up the stored tuple (j, p, 0 , r). Compute Mo = (G n,-i 

Mf +Ti )~ T o+ T . Note that for this choice of M 0 , it will be the case that e(G t . G) 
lli=o e(G[ i+r , Mfj = e(G t , Mq 0+t GnLi M P +T ) = 1 and similarly e(H t , G ) 
rii=o <H? +T , Mf) = e(H t , Mq 0+t G n*=i Mf i+T ) = 1. Note also that the tag 
is independent of r, and since r is uniformly distributed, then Mo is independent of 
To, ... , T k even given tag. (To see this, let mo, . . . , m k be the discrete logarithms 
of Mo , . . . , M ; c respectively and note that for any choice of toi , . . . , m k , to , . . . , r k 
and for any too such that too f m i, there is a f chance that we will 

choose r = ~ 1 ^$ i= ^ TOiTi which will yield M 0 = (G flt=i Mp +T )~ T a + T .) Now 
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compute Z = n,-o 0; Pi P ar| d W = n*=o 0: ° an d output the signature 

(Z,W,M 0 ). 

Note that these are the unique values such that e(G r ,Z) ■ e(G s , W) ■ 
e{G t ,G)X\ k i=0 e{G i T 1 ,M i ) = 1 and similarly e{H r ,Z) • e(H s ,W) ■ 

e(H t ,G)Yl k =0 e(HiT2,Mi) = 1. Thus, Z,W are uniquely determined by 
Mo, Mi, . . . , M*,, tag, and pk. Mi, . . . , M*, are provided by the adversary and, 
as we have argued, Mo, tag , pk are statistically independent of to, ■ ■ . , t*,. We con- 
clude that Z, W reveal no additional information about to, , ti- even given the 
rest of the adversary’s view. 

- For session j*: Look up the stored tuple (j, p, a). Let Mo = (G rii=i 00 
Note that for this choice of M 0 , it will be the case that e(G t , G) n*=o e 0T • M,) = 
e(G t , Mq u G n!^i 00 = 1 and similarly e{H t ,G)X\ k =0 e{H'[ i ,Mi) = 

e(H t , Mq°G nf=i 00 = 1. Note that Ti, T 2 are correctly distributed, that M 0 
is statistically close to uniform since to , • • • , r/ ;: are chosen at random, and further- 
more that the only information revealed about to, ... , 7/ c is that G ]0_o Mp = 1. 
Now, compute Z = n^o M~ Pi ~ p and W = n*=o 0 _<7i 0 and output 
the signature ( Z , W. Mo). Again all values are independent of to, , Tk with the 
exception now of Mo, which is chosen so G njLo MA = 1. 

Processing the adversary’s forgery. Now, suppose that the adversary produces 
(M\ ,... M\) and (Z t , 10 , , T) for T = (Ti , T 2 , . . . ) used in the j* th query. Look 
up the stored tuple (j* , p, a). Then with non-negligible probability (whenever the ad- 
versary succeeds) we have TOS.Vrffpfc, T, (M-J, . . . , Ml), (0, 10, Mq)) = 1. This 
means 

k k 

1 = e(G r , 0G« \[{M}) Pi+p+ ^)e{G s , V0G" II(M/0 + ^0, and 

1=0 i = 0 

1 = e(H r , 0G 5 f[{M}) pi+p +Z Ti )e(H s , W^G P f[(Mjy i+ ^ Ti ). 

2=0 2=0 

So if 0G« Ui=o( M l) Pi+P+iTi £ !* then 
(0, i?*, 5*) := fl(Mjy* +p+ ^ r ', 10 

is a valid solution for the simultaneous double pairing assumption. 

00 Yli=o( M I) Pi+p+ ^ = .0 ni=o( M i) Pi+p ( G Hi=o( M i) Ti ) C ’ and a part of 0 n*L 0 
(Aft)Pi+p j s information theoretically hiding. Note that the only information that the 

adversary has about to, n is that in the j*th session M 0 was chosen so that 

G nto M i‘ = 1 (where M = (Mi,...,M fc ) is the message signed in the j*th ses- 
sion). If mJ A Mi for at least one i, then the probability that G ]0 =o (0 ) T ' = 1 
conditioned on the fact that G Y[ k =o M P = l is l/p. As a result, the probability that 
00 Y\ k i=o^ M h Pi+p+in = 1 is l /p- 

Thus, if the guess for j * is right, we succeed with all but probability l /p whenever A 
does. We therefore have Adv^"^ (A) < q s ■ Advg p B (A) + l/p. 



Constant-Size Structure-Preserving Signatures under Simple Assumptions 


RMA-secure signature scheme. For our random message signature scheme we will use 
a construction based on the dual system signature proposed in iPTTl . While the orig- 
inal scheme is CMA-secure under the DLIN assumption, the security proof makes 
use of a trapdoor commitment to elements in Z p and consequently messages are el- 
ements in Zp rather than G. Our construction below resorts to RMA-security and re- 
moves this commitment to allows messages to be a sequence of random group ele- 
ments satisfying a particular relation. As mentioned above, the message space M x := 
{ (C mi , C m 2 , F mi , F rn ' 2 , U ™ 1 , U ™ 2 ) G G 6 | (mi , m 2 ) G Z' p } is defined by generators 
{C, F,U!,U 2 ) in gk. 

rSIG.Key(gffc): Given gk := (A, G, C, F. Uj. U 2 ) as input, uniformly select 

V,Vi,V 2 ,H from G* and ai,a 2 ,b,a, and p from Z*. Then compute and out- 
put vk := {B,A 1 ,A 2 ,B 1 ,B 2 ,R 1 ,R 2 ,W 1 ,W 2 ,V,V 1 ,V 2 ,H,X 1 ,X 2 ) and sk := 
(vk,Ki,K 2 ) where 

B := G\ A\ := G a \ A 2 := G“ 2 , Si := G b a \ B 2 := G ba2 

Si := VVf 1 , R 2 := VV 2 a2 , WG := R\, W 2 := R\ , 

X x := G p , X 2 := G aaib / p , K 7 := G a , K 2 := G a ai . 

rSIG. Sign(sfc, msg): Parse msg into {M\,M 2 ,M 3 ,M 4 ,M§,Mq). Pick random 
77, r 2 , zi, z 2 G Z p . Let r = pj + r 2 . Compute and output signature a := 
(So, Si, . . .S7) where 

S 0 := (MsMgSy 1 , Si:=iC 2 TG, S 2 := Kf x VfG z \ S 3 :=B~ Zl , 

S 4 := V£G Z2 , S 5 := B ~ Z2 , S 6 := B r2 , S 7 := G ri . 

rSIG.Vrf (vk,o,msg): Parse msg into (Mj , M 2 . M3, M 4 , M 5 , Mq) and a into 
(So, Si, , S7). Also parse vk accordingly. Verify the following pairing product 
equations: 


e(S 7 ,M 5 M 6 H) = e{G,S 0 ) 

e{S u B) e(S 2 , Si) e(S 3 , A x ) = e(S 6 , Si) e(S 7 , Wi) 

e{Si,B) e(S 4 , B 2 ) e(S 5 ,A 2 ) = e{S 6 , R 2 ) e(S 7 , W 2 ) e(X 1 ,X 2 ) 

e{F,M 1 ) = e{C,M 3 ), e(F,M 2 )=e{C,M 4 ), e{U 1 ,M 1 ) = e{C,M s ), e(U 2 ,M 2 ) = e{C,M 6 ) 

The scheme is structure-preserving by construction and the correctness is easily veri- 
fied. 

Theorem 24 . The above rSIG scheme is UF-RMA under the DLIN assumption. In par- 
ticular, for any p.p.t. adversary A against rSIG that makes at most q s signing queries, 
there exists p.p.t. algorithm B for DLIN such fGu Adv^JjG^A) < (g s + 2 ) • Adv^g(A). 

Proof. We refer to the signatures output by the signing algorithm as a normal signature. 
In the proof we will consider an additional type of signatures to which we refer to 
as simulation-type signatures that are computationally indistinguishable but easier to 
simulate. For 7 G Z p , simulation-type signatures are of the form o = (So, S[ = S 7 ■ 
G-°i°2"/, S ! 2 = S 2 ■ G 027 ,S 3 , S ' 4 = S 4 • G“ 17 , S 5 , . . . , S 7 ). We give the outline of the 
proof using some lemmas. 
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Lemma 25. Any signature that is accepted by the verification algorithm must be 
formed either as a normal signature, or a simulation-type signature. 

We consider a sequence of games. Let Pi be the probability that the adversary succeeds 
in Game i, and p“ orm (A) and pf m (A) that he succeeds with a normal-type respectively 
simulation-type forgery. Then by Lemmal2bl pfX) = p“ orm (A) + p* un (A) for all i. 

Game 0: The actual Unforgeability under Random Message Attacks game. 

Lemma 26. There exists an adversary Bi such that p™ (A) = Adv^ 1 ' BiW- 

Game i: The real security game except that the first i signatures that are given by the 
oracle are simulation-type signatures. 

Lemma 27. There exists an adversary B2 such that |p"""”(A) — p"°™(A)| = 

Adv a!B 2 ( A )- 

Game q: All sigantures that given by the oracle are simulation-type signatures. 
Lemma 28. There exists an adversary B3 such that p" onn (A) = Ad Vg d g 3 (A). 

We have shown that in Game q, A can output a normal-type forgery with at most 
negligible probability. Thus, by Lemma we can conclude that the same is true in 
Game 0 and it holds 

Adv&c™ (A) = po(A) = Po““(A) + Po onn (A) < Po“( A ) + ^ IK-™ (A) ~ PT m ( A)| + P^W 
< Ad Vg gj (A) + qAdvg g 2 (A) + Advg g 3 (A) < (q + 2) • Advg g(A) . 

Let MSGGen be an extended random message generator that first chooses 
aux = (mi, m2) randomly from Z^ and then computes msg = 
(C mi1 , C rn ' 2 . F mi , F rn2 , U{ ni . Uf 1 ' 2 ). Note that this is what the reduction algo- 
rithm does in the proof of Theorem I7TI Therefore, the same reduction algorithm works 
for the case of extended random message attacks with respect to message generator 
MSGGen. We thus have the following. 

Corollary 29. Under the DLIN assumption, rSIG scheme is UF-XRMA w.r.t. the mes- 
sage generator that provides aux = (mi, m2) for every message msg = (C mi , C m 2 , 
prni ^ F rn ' 2 , [/("i , UJf 1 ' 2 ). In particular, for any p.p.t. adversary A against rSIG that is 
given at most q s signatures, there exists p.p.t. algorithm B such that Adv^sl^ 1 ^ (A)< 
(q s + 2)-Advg£(A). 

Security and efficiency of resulting SIG1. Let SIG1 be the signature scheme obtained 
from TOS (with mode = extended) and rSIG by following the first generic construction 
in Section^] From Theorem IT71 12(111211 andl2H the following is immediate. 

Theorem 30. SIG1 is a structure-preserving signature scheme that yields constant-size 
signatures, and is UF-CMA under the DLIN assumption. In particular, for any p.p.t. 
adversary Afar SIG1 making at most q s signing queries, there exists p.p.t. algorithm B 
such that Advs[gx^(A) < (q s + 3) • Advg'g(A) + 1/p. 
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6 Instantiating SIG2 

We instantiate the POS and xSIG building blocks of our second generic construction 
to obtain our second SPS scheme. Here we choose the Type-III bilinear group setting. 
The resulting SIG2 scheme is an efficient structure-preserving signature scheme based 
on SXDH and XDLIN. 

Setup for Type-III groups. The following setup procedure is common for all building 
blocks in this section. The global parameter gk is given to all functions implicitly. 

- Setup(l A ): Run A = {p, Gi,G 2 , Gt, e) <— Q(l x ) and choose generators G G G* 
and G G G 2 . Also choose u, f 2 , f 3 randomly from Z*, compute F 2 := G? 2 , 
F 3 := G fs , F 2 := G fz , F 3 := G fs , U := G u , U := G u , and output gk := 
{A, G, G, F 2 , F 3 , F 2 , F 3 , U, U). 

A gk defines a message space _M X = {(F™, F™, U m ) G G2 | m G Z p } for the 
signature scheme in this section. For our generic construction to work, the partial one- 
time signature scheme should have the same key space. 

Partial one-time signatures for uniliteral messages. We construct a partial one-time 
signature scheme POSu2 for messages in Gf for k > 0. The suffix ”u2” indicates 
that the scheme is uniliteral and messages are taken from G2. Correspondingly, POSul 
refers to the scheme whose messages belong to Gi, which is obtained by swapping G2 
and Gi in the following description. Our POSu2 scheme is a minor refinement of the 
one-time signature scheme introduced in E9- It comes, however, with a security proof 
for the new security model. 

Basically, a one-time public-key in our scheme consists of one element in the base 
group Gi that is the opposite of the group G2 messages belong to. This property 
is very useful to construct a POS scheme for signing bilateral messages. As well 
as tags of T OS in Section 0 the one-time public-keys of POS will have to be in 
an extended form to meet the constraint from xSIG presented in the sequel. We use 
mode G {normal, extended} for this purpose again. 

- POSu2. Key (gk): Take generators U and U from gk. Choose w r randomly from 
Z* and compute G, := U Wr . For i = 1, . . . , uniformly choose \i ar| d 7 i from 
Z p and compute Gi := U Xi Gf . Output pk := (G r , G\ , ....Gf) € G} +l and 
sk:=(x i.7l: -,Xk,Ak,w r )- 

- POSu2.Update(moc/e): Take F 2 ,F 3 ,U from gk. Choose a <— Z p and output 
opk := U a G Gi if mode = normal or opk := (F£,F£,U a ) G Gf if 
mode = extended. Also output osk := a. 

- POSu2.Sign(sA;, msg, osk): Parse msg into (Mi, ■ ■ ■ , Mk) € G^j. Take a and w r 
from osk and sk, respectively. Choose p randomly from Z p and compute C, := 
a — pw r mod p. Then compute and output cr := (Z,R) G G 2 as the signature, 
where X := & n?=i Mf Xi and R := U p Ui=i 

- POSu2.Vrf(pfc, cr, msg, opk): Parse a as (Z, R) G G|, msg as (Mi, ■ . . , Mk) 6 
Gf, and opk as (A 2 , A :i , A) or A depending on mode. Return 1, if e(A,U) = 
e(U, Z) e(G r , R) {lt=i e (G,;, Mi) holds. Return 0, otherwise. 
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Scheme POSw2 is structure-preserving and has uniform one-time public-key property 
from the construction. We can easily verify that it is correct by simple calculation. 

Theorem 31. POSw2 is strongly unforgeable against OT-CMA ifDBPi holds. In par- 
ticular, Advp°os™^(A) < Adv^g(A) + 1/p. 

Partial one-time signatures for bilateral messages. Using POSwl for msg £ 
and POSw2 for msg £ G % 2 , we construct a POS6 scheme for signing bilateral messages 
( msgi,msg2 ) £ G* 1 x Gif . The scheme is a simple two-story construction where 
msg2 is signed by POSw2 with one-time secret-key osk 2 £ Gi and then the one-time 
public-key opk 2 is attached to rnsgi and signed by POSwl. Public-key opk 2 is included 
in the signature, and opk 1 is output as a one-time public-key for POSb. 

- POSb.Key(gk): Run (pk 1 ,ski) <- POSwl. Key(gk) and ( pk 2 ,sk2 ) «— 
POSw2.Key(gfc). Set pk := ( pk 1 ,pk 2 ) and sk := (ski, s& 2 ), and output ( pk , sk). 

- POSb.Update(mode): Run ( opk , osk) <— POSul(mode) and output ( opk , osk ). 

- POSb.Sign(sfc, msg, osk): Parse msg into (msg\,msg2) € G* 1 X G'f, and sk 
into (sfci, sk2). Run ( opk 2 , oskf) <— POSw2.Update(normal), and compute op <— 
POSu2.S\gn(sk2,msg2, osk2) and o% <— POSul.S\gn(ski,(msgi,opk 2 ),osk). 
Output cr := (op, op, opk 2 ). 

- POSb.\Irf(pk, opk,o,msg): Parse msg into (msgi , msgf) £ G^ 1 x G* 2 , 
and 0 into (01,02, opk 2 ). If 1 = POSul.\/rf(pk 1 , opk,oi,(msgi, opk 2 )) = 
POSw2.Vrf (pk 2 , opk 2 , op ■ msg-2 ) , output 1. Otherwise, output 0. 

For a message in G* 1 x G 2 2 , the above POSb uses a public-key of size (k + 2, k + 1), 
yields a one-time public-key of size (0, 1) (for mode = normal) or (0, 3) (for mode = 
extended), and a signature of size (3, 2). Verification requires 2 pairing product equa- 
tions. A one-time public-key in extended mode, which is treated as a message to xSIG 
in this section, is of the form opk = (F 2 , Ff , U a ) £ G|. Structure-preservance and 
uniform public-key property are taken over from the underlying POSwl and POSw2. 

Theorem 32. Scheme POSb is unforgeable against OT-CMA ifSXDH holds. In partic- 
ular, Adv P0Sfe _4(A) Adv ^ ^ (A)+2/p. 

XRMA-secure signature scheme. Our construction bases on a variant of Waters’ dual 
system encryption proposed by Ramanna, Chatterjee, and Sarkar |[J51 . Recall that gk = 
(A, G, G, F 2 ,Fs,F 2 , F 3 , U, U) with A = {p, Gi,G 2 ,GT,e) is generated by Setup(l A ) 
in advance. 

xSIG.Gen(p/c): On input gk, select generators V.V' ,H t— Gi, V,V',H £ G 2 
such that V ~ V, V' ~ V',H ~ H, F 2 ~ F 2 ,F 3 ~ F 3 and exponent 
a,b,a <- Z p and p «— Z*, compute R := V(V') a ,R := V(y') a , and set 
v k := (gk, G b , G a , G ba , R, R b , sk := (VK, G“, G“, G b \. 
xSIG.Sign (sk,msg): On input message msg = (Mi,M 2 ,M 0 ) = (Ff , Ff 1 ,U m ) £ 
G2 (m £ Z p ), select ri,r 2 <— Z p , set r := r\ + r 2 , compute op := (MoH) ri , 
op := G a V r , 02 := (V') r G~ z , o 3 := (G b ) z , <74 := (G b ) r2 , and 0-5 := G r \ and 
output 0 := (00, o\, . . . , 05) £ G 2 X Gf . 
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Table 1. Efficiency of our schemes (SIG1 and SIG2) and comparison to other schemes with 
constant-size signatures. The top section is for the Type 1 variant, the middle section is for uni- 
lateral messages and the lower section is for bilateral messages. Notation (x, y ) represents x 
elements in Oi and y in G 2 . 


Schemes 

\msg\ 

\gk\ + \vk\ 

H 

#(PPE) 

Assumptions 

AHOIO 

k 

2k + 12 

7 

2 

q-SFP 

SIG1 

k 

2k + 25 

17 

9 

DLIN 

AHOIO 

(fci.O) 

(4, 2ki + 8) 

(5,2) 

2 

q-SFP 

AGHOll 

(fci.O) 

(1, ki + 4) 

(3,1) 

2 

q-type 

SIG2 : POSul+xSIG 

(fci.O) 

(7, ki + 13) 

(7,4) 

5 

SXDH, XDLINi 

POS6 + AHOIO 

(ki,k 2 ) (k 2 + 5, fei + 12) (10,3) 

3 

q-SFP 

AGHOll 

(fci, fe) (fe + 3, ki + 4) 

(3,3) 

2 

q-type 

SIG2 : POSfc + xSIG 

(ki,k 2 ) (k 2 +8,ki + 14) (8,6) 

6 

SXDH, XDLINi 


xSIG.Vrfy {vk,o,msg): On input vk, rrisg = (Mi, M 2 , Mo), and signatures, compute 

e(F 2 , Mo) = e(U, Mi), e(F 3 , M 0 ) = e(U, M 2 ), e(o 5 , M 0 H) = e(G, o 0 ) 
e{oi,G b )e(o 2 ,G ba )e{o 3 ,G a ) = e{oi,R)e(o 5 , R b )e(G p , G ab/p ). 

The scheme is structure-preserving by the construction. We can easily verify the cor- 
rectness. 

Theorem 33. If the DDH 2 and XDLINi assumptions hold, then above xSIG scheme 
is UF-XRMA with respect to the message generator that returns aux = m for every 
random message msg = (F™, F™, U rn ). In particular for any p.p.t. adversary A for 
xS I G making at most q signing queries, there exist p.p. t. algorithms Bi,B 2 ,B 3 such that 
AdVxsiG™ W < Advg|g 2 i (A) + gAdv )?^ 1 (A) + Advg? J^ h (A). 

Security and efficiency of resulting SIG2. Let SIG2 be the scheme obtained from POS6 
(with mode = extended) and xSIG. SIG2 is structure-preserving as vk, o, and msg 
consist of group elements from Gi and G 2 , and SIG2.Vrf evaluates pairing product 
equations. From Theorem 1271 IT71 andlTTl we obtain the following theorem. 

Theorem 34. SIG2 is a structure-preserving signature scheme that is unforgeable 
against adaptive chosen message attacks ifSXDH and XDLINi hold for Q. 


7 Efficiency, Applications and Open Questions 

Efficiency. Tabled summarizes the efficiency of SIG1 and SIG2. For SIG2 we consider 
both uniliteral and biliteral messages. We count the number of group elements excluding 
a default generator for each group in gk, and distinguish between <Gi and G 2 and use k\ 
and k 2 for the number of message elements in G-| and G 2 , respectively. For comparison, 
we include the efficiency of the schemes in 0 and 0. For bilateral messages, AHOIO 
is combined with POSb from Section [6j 
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Applications. Structure-preserving signatures (SPS) have become a mainstay in cryp- 
tographic protocol design in recent years. From the many applications that benefit from 
efficient SPS based on simple assumptions, we list only a few recent examples. Using 
our SIG1 scheme from Section|3both the construction of a group signature scheme with 
efficient revocation by Libert, Peters and Yung m and the construction of compact ver- 
ifiable shuffles by Chase et al. m can be proven purely under the DLIN assumption. 
All other building blocks already have efficient instantiations based on DLIN. 

Hofheinz and Jager m construct a structure-preserving one-time signature scheme 
and use it to build a tree-based SPS scheme, say tSIG. Instead, we propose to use our 
partial one-time scheme to construct tSIG. As the resulting tSIG is secure against non- 
adaptive chosen message attacks, it is secure against extended random message attacks 
as well. We then combine the POS6 scheme and the new tSIG scheme according to 
our second generic construction. As confirmed with the authors of BTI . the resulting 
signature scheme is significantly more efficient than ED and is a SPS scheme with a 
tight security reduction to SXDH. One can do the same in Type-I groups by using the 
tagged one-time signature scheme in Section0whose security tightly reduced to DLIN. 

As also shown by m, SPS schemes allow to implement simulation-sound NIZK 
proofs based on the Groth-Sahai proof system. Following the Naor-Yung-Sahai 113 513 811 
paradigm, one obtains structure-preserving CCA-secure public-key encryption in a 
modular fashion. 

Open Questions. 1) Can we have (X)RMA-secure schemes with a message space that 
is a simple Cartesian product of groups without sacrificing on efficiency? 2) The RMA- 
secure signature schemes developed in this paper are in fact XRMA-secure. Can we 
have more efficient schemes by resorting to RMA-security? 3) Can we have tagged 
one-time signature schemes with tight reduction to the underlying simple assumptions? 
4) What is the exact lower bound for the size of signatures under simple assumptions? 
Is it possible to show such a bound? 
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Abstract. In this paper, we introduce the abstraction of Dual Form 
Signatures as a useful framework for proving security (existential un- 
forgeability) from static assumptions for schemes with special structure 
that are used as a basis of other cryptographic protocols and applications. 
We demonstrate the power of this framework by proving security under 
static assumptions for close variants of pre-existing schemes: the LRSW- 
based Camenisch-Lysyanskaya signature scheme, and the identity-based 
sequential aggregate signatures of Boldyreva, Gentry, O’Neill, and Yum. 
The Camenisch-Lysyanskaya signature scheme was previously proven 
only under the interactive LRSW assumption, and our result can be 
viewed as a static replacement for the LRSW assumption. The scheme 
of Boldyreva, Gentry, O’Neill, and Yum was also previously proven only 
under an interactive assumption that was shown to hold in the generic 
group model. The structure of the public key signature scheme under- 
lying the BGOY aggregate signatures is quite distinctive, and our work 
presents the first security analysis of this kind of structure under static 
assumptions. 


1 Introduction 

Digital signatures are a fundamental technique for verifying the authenticity 
of a digital message. The significance of digital signatures in cryptography is 
also amplified by their use as building blocks for more complex cryptographic 
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protocols. Recently, we have seen several pairing based signature schemes (e.g., 
}1 711 dl‘24E%l | ) that are both practical and have added structure which has been 
used to build other primitives ranging from Aggregate Signatures fl 514:11 to 
Oblivious Transfer f25l.'S2j . Ideally, for such a fundamental cryptographic primi- 
tive we would like to have security proofs from straightforward, static complexity 
assumptions. 

Meeting this goal for certain systems is often challenging. For instance, the 
Camenisch and Lysyanskaya signature scheme 00 has been very influential as 
it is used as the foundation for a wide variety of advanced cryptographic systems, 
including anonymous credentials |24l7lbf , group signatures f24!5j , ecash |22| , un- 
cloneable functions [TJ] , batch verification (23j , and RFID encryption j2j . While 
the demonstrated utility of CL signatures has made them desirable, it has been 
difficult to reduce their security to a static security assumption. Currently, the 
CL signature scheme is proven secure under the LRSW assumption an 
interactive complexity assumption that closely mirrors the description of the 
signature scheme itself. In addition, the interactive assumption transfers to the 
systems built around these signatures. 

The identity-based sequential aggregate signatures of Boldyreva, Gentry, 
O’Neill, and Yum fbll Of were also proven in the random oracle model under 
an interactive assumption (justified in the generic bilinear group model), which 
again closely mirrors the underlying signature scheme itself. (This can be viewed 
as providing a proof of the scheme only in the generic group model.) Proofs of 
complicated interactive assumptions in the generic group model have several 
disadvantages. First, they are themselves complex and prone to error. In fact, 
the original version of the BGOY identity-based sequential aggregate signature 
scheme jSj relied on an assumption that was shown to be false, and the scheme 
was insecure m- This scheme and proof were corrected in [HI • Secondly, such 
proofs do not tend to provide much insight into the security of the scheme. 
This lack of insight tends to hinder transferring schemes to other settings. For 
example, many schemes developed in bilinear groups now have lattice-based 
analogs, and these transformations reused high-level ideas from the original se- 
curity proofs in the bilinear group setting. Techniques from gBl were used in the 
lattice setting in m, techniques from were used in m, and techniques from 
H2 were used in j2j . This kind of transference of ideas from the bilinear setting 
to the lattice setting is unlikely to be achieved through generic group proofs. 

In this work, we develop techniques that can be applied to prove security 
from static assumptions for new signature schemes as well as (slight variants of) 
pre-existing schemes. Providing new proofs for these existing schemes provides 
a meaningful sanity check as well as new insight into their security. This kind of 
sanity check is valuable not only for schemes proven in the generic group model, 
but also for signatures (CL signatures included) that require extra checks to 
rule out trivial breaks (e.g. not allowing the message signed to be equal to 0), 
since these subtleties can easily be missed at first glance. Having new proofs 


1 Throughout, we will be discussing the CL signatures based on the LRSW assump- 
tion, which should not be confused with those based on the strong RSA assumption. 
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from static assumptions for variants of schemes like CL signatures and BGOY 
signatures gives us additional confidence in their security without having to 
sacrifice the variety of applications built from them. Ultimately, this provides 
us with a fuller understanding of these kinds of signatures, and is a critical step 
towards obtaining proofs under the simplest and weakest assumptions. 

Dual Form Signatures. Our work is centered around a new abstraction that 
we call Dual Form Signatures. Dual Form Signatures have similar structure to 
existing signature schemes, however they have two signing algorithms, Sign A 
and Sign B , that respectively define two forms of signatures that will both verify 
under the same public key. In addition, the security definition will categorize 
forgeries into two disjoint types, Type I and Type II. Typically, these forgery 
types will roughly correspond with signatures of form A and B. 

In a Dual Form system, we will demand three security properties (stated 
informally here): 

A-I Matching. If an attacker is only given oracle access to Sign A , then it is 
hard to create any forgery that is not of Type I. 

B-II Matching. If an attacker is only given oracle access to Sign B , then it is 
hard to create any forgery that is not of Type II. 

Dual-Oracle Invariance. If an attacker is given oracle access to both Sign A 
and Sign B and a “challenge signature” which is either from Sign A or Sign B . 
the attacker’s probability of producing a Type I forgery is approximately 
the same when the challenge signature is from Sign A as when the challenge 
signature is from Sign B . 

A Dual Form Signature scheme immediately gives a secure signature scheme if 
we simply set the signing algorithm Sign = Sign A . Unforgeability now follows 
from a hybrid argument. Consider any EUF-CMA m attacker A. By the A-I 
matching property, we know that it might have a noticeable probability e of 
producing a Type I forgery, but has only a negligible probability of producing 
any other kind of forgery. We then show that e must also be negligible. By the 
dual-oracle invariance property, the probability of producing a Type I forgery 
will be close to e if we gradually replace the signing algorithm with Sign B , one 
signature at a time. Once all of the signatures the attacker receives are from 
Sign B , the B-II Matching property implies that the probability of producing a 
Type I forgery must be negligible in the security parameter. 

We demonstrate the usefulness of our framework with two main applications, 
using significantly different techniques. This illustrates the versatility of our 
framework and its adaptability to schemes with different underlying structures. 
In particular, while dual form signatures are related to the dual system en- 
cryption methodology introduced by Waters j2SI for proving full security of IBE 
schemes and other advanced encryption functionalities, we demonstrate that our 
dual form framework can be applied to signature schemes that have no known 
encryption or IBE analogs. Though all of the applications given here use bilinear 
groups, the dual form framework can be used in other contexts, including proofs 
under general assumptions. 
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Our first application is a slight variant of the Camenisch-Lysyanskaya signa- 
ture scheme, set in a bilinear group G of composite order N = pij> 2 P 3 - This 
application is surprising, since these signatures do not have a known IBE ana- 
log. We let G Pi for each * = 1, 2, 3 denote the subgroup of order Pi in the group. 
The Sign A algorithm produces signatures which exhibit the CL structure in the 
G Pl and G P2 subgroups and are randomized in the G P3 subgroup. The Sign B 
algorithm produces signatures which exhibit the CL structure in the G Pl sub- 
group and are randomized in the G P2 and G P3 subgroups. Type I and II forgeries 
roughly mirror signatures of form A and B. The verification procedure in our 
scheme will verify that the signature is well formed in the G Pl subgroup, but 
not “check” the other subgroups. 

We prove security in the dual form framework based on three static subgroup 
decision- type assumptions, similar to those used in UQ. The most challeng- 
ing part of the proof is dual-oracle invariance, which we prove by developing a 
backdoor verification test (performed by the simulator) which acts as an almost- 
perfect distinguisher between forgery types. Here we face a potential paradox, 
which is similar to that encountered in dual system encryption |4I14Ij : we need 
to create a simulator that does not know whether the challenge signature it pro- 
duces is distributed as an output of Sign A or Sign B , but it also must be able to 
test the type of the attacker’s forgery. To arrange this, we create a “backdoor 
verification” test, which the simulator can perform to test the form of all but 
a small space of signatures. Essentially, this backdoor verification test acts an 
almost-perfect type distinguisher which fails to correctly determine the type of 
only a very small set of potential forgeries. 

The challenge signature of unknown form produced by the simulator will fall 
within the untestable space; however, with very high probability a forgery by 
an attacker will not, because some information about this space is information- 
theoretically hidden from the attacker. This is possible because the elements of 
the verification key are all in the subgroup G Pl , and the space essentially resides 
in G P2 . Thus the verification key reveals no information about the hidden space. 
The only information about the space that the attacker receives is contained 
in the single signature of unknown type, and we show that this is insufficient 
for the attacker to be able to construct a forgery that falls inside the space 
for a different message. This is reminiscent of the concept of nominal semi- 
functionality in dual system encryption (introduced in EH) : in this setting, the 
simulator produces a key of unknown type which is correlated in its view with the 
ciphertext it produces, but this correlation is information- theoretically hidden 
from the attacker. This correlation prevents the simulator from determining the 
type of the key for itself by testing decryption against a ciphertext. 

As a second application of our dual form framework, we prove security from 
static assumptions for a variant of the BGOY identity-based sequential aggre- 
gate signature scheme. Aggregate signatures are useful because they allow sig- 
nature “compression,” meaning that any n individual signatures by n (possi- 
bly) different signers on n (possibly) different messages can be transformed into 
an aggregate signature of the same size as an individual one that nevertheless 
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allows verifying that all these signers signed their messages. However, aggre- 
gate signatures do not provide compression of the public keys, which are needed 
for signature verification. In the identity-based setting, only the identities of the 
signers are needed - this is a big savings because identities are much shorter than 
randomly generated keys. However, identity-based aggegate signatures have been 
notoriously difficult to realize. 

We first prove security for a basic public-key version of the scheme, and then 
show that security for its identity-based sequential aggregate analog reduces to 
security of the basic scheme (in the random oracle model, as for the original 
proof). Our techniques here are significantly different, and reflect the different 
structure of the scheme (it is this structure that allows for aggregation). The 
core structure of the underlying public key scheme is composed of three group 
elements of the form g a+bm g rir2 ,g ri ,g r2 , where m is a message (or a hash of 
the message), a, b are fixed parameters, and n, r% are randomly chosen for each 
signature. There are significant differences between this and the core structure of 
other notable signatures, like CL and Waters signatures |24I4!S| . Here, the mes- 
sage term is not multiplicatively randomized, but rather additively randomized 
by the quadratic term rq rq ■ It is the quadratic nature of this term that allows 
verification via application of the bilinear map while thwarting attackers who try 
to combine received signatures by taking linear combinations in the exponents. 
This unique structure presents a challenge for static security analysis, and we 
develop new techniques to achieve a proof for a variant of this scheme in our 
dual form framework. 

We still employ composite order subgroups, with the main structure of the 
scheme reflected in the G Pl subgroup and the other two subgroups used for dif- 
ferentiating between signature and forgery types. However, to prove dual-oracle 
invariance, we rely on the fact that the scheme has the basic structure of a one- 
time signature scheme embedded in it, in addition to the quadratic mechanism 
to prevent an attacker from forming new signatures by taking combinations of 
received signatures. We capture the security resulting from this combination of 
structures through a static assumption for our dual-oracle invariance proof, and 
we show that this assumption holds in the generic group model. Though we do 
employ the generic group model as a check on our static assumptions, we believe 
that our proof provides valuable intuition into the security of the scheme that is 
not gleaned from a proof based on an interactive assumption or given solely in 
the generic group model. Also, checking the security of a static assumption in 
the generic group model is much easier (and less error- prone) than checking the 
security of an interactive assumption or scheme. We believe that the techniques 
and insights provided by our proof are an important step toward finding a prime 
order variant of the scheme that is secure under more standard assumptions, 
such as the decisional linear assumption. 

In the full version, we provide one more application: a signature scheme us- 
ing the private key structure in the Lewko- Waters IBE system jTT]. The LW 
system itself can be viewed as a composite order extension of the Boneh-Boyen 
selectively secure IBE scheme Cl, although the structure of the proofs of these 
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systems are very different (LW achieves adaptive security). For this reason, we 
call these “BB-derived” signatures. While the existing LW IBE system can be 
transformed into a signature scheme using Naor's 0 general transformation, our 
scheme checks the signature “directly” without going through an IBE encryp- 
tion. The resulting signature has a constant number of elements in the public 
key and signatures consist of two group elements. 

Further Directions. While we have focused here on applying our techniques for 
core short signatures, we envision that dual form signatures will be a framework 
for proving security of many different signature systems that have to this point 
been difficult to analyze under static assumption Some examples include em- 
bed additional structure, such as Attribute-Based signatures |2SJ and Quoteable 
signatures 0 . Attribute-based signatures allow a signer to sign a message with 
a predicate satisfied by his attributes, without revealing any additional infor- 
mation about his attribute set. Our framework could potentially be applied to 
obtain stronger security proofs for ABS schemes, such as the schemes of 
proved only in the generic group model. Quoteable signatures enable derivation 
of signatures from each other under certain conditions, and current constructions 
are proved only selectively secure 0. Another future target is signatures that 
“natively” sign group elements [Q . 

The primary goal of our work is providing techniques for realizing security 
under static assumptions, and we leverage composite order groups as a con- 
venient setting for this. A natural future direction is to complement our work 
by discovering prime order analogs of our techniques. Many previous systems 
were originally constructed in composite order groups and later transferred into 
prime order groups j I (il.'ill I Hl.'fil I !)ll 7l.'i5l. v )8l.'i7l28l2!)l4( . The general tech- 
niques presented in I28i:')!)j do not seem directly applicable here, but we empha- 
size that our dual form framework is not tied to composite order groups and 
could also be used in the prime order setting. Discussion of additional related 
works can be found in the full version. 

2 Dual Form Signatures 

We now define dual form signatures and their security properties. We then show 
that creating a secure dual form signature system naturally yields an existentially 
unforgeable signature scheme. We emphasize that the purpose of the dual form 
signature framework is to provide a template for creating security proofs from 
static assumptions, but the techniques employed to prove the required properties 
can be tailored to the structure of the particular scheme. 

Definition. We define a dual form signature system to have the following algo- 
rithms: 

KeyGen(A): Given a security parameter A, generate a public key, VK, and a 
private key, SK. 

2 Naor’s observation was noted in nn 
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Sign 4 (SK, M): Given a message, M, and the secret key, output a signature, a. 
Sign,.(SK, M): Given a message, M, and the secret key, output a signature, a. 
Verify (VK, M, a): Given a message, the public key, and a signature, output 
‘true’ or ‘false’. 

We note that a dual form signature scheme is identical to a usual signature 
scheme, except that it has two different signing algorithms. While only one sign- 
ing algorithm will be used in the resulting existentially unforgeable scheme, 
having two different signing algorithms will be useful in our proof of security. 

Forgery Classes. In addition to having two signature algorithms, the dual form 
signature framework also considers two disjoint classes of forgeries. Whether 
or not a signature verifies depends on the message that it signs as well as the 
verification key. For a fixed verification key, we consider the set of pairs, S x M, 
over the message space, M, and the signature space, S. Consider the subset of 
these pairs for which the Verify algorithm outputs ‘true’: we will denote this 
set as V. 0 We let V/ and V// denote two disjoint subsets of V, and we refer to 
signatures from these sets as Type I and Type II forgeries, respectively. In our 
applications, we will have the property V = V/ U V/j in addition to V/ fl V// = 0, 
but only the latter property is necessary. 

We will use these classes to specify two different types of forgeries received 
from an adversary in our proof of security. In general, these classes are not 
the same as the output ranges of our two signing algorithms. However, Type I 
forgeries will be related to signatures output by the Sign 4 algorithm and Type 
II forgeries will be related to signatures output by the Sign B algorithm. The 
precise relationships between the forgery types and the signing algorithms are 
explicitly defined by the following set of security properties for the dual form 
system. 

Security Properties. We define the following three security properties for a dual 
form signature scheme. We consider an attacker A who is initially given the 
verification key VK produced by running the key generation algorithm. The 
value SK is also produced, and not given to A. 

A-I Matching: Let Oa be an oracle for the algorithm Sign 4 . More precisely, 
this oracle takes a message as input, and produces a signature that is identi- 
cally distributed to an output of the Sign 4 algorithm (for the SK produced 
from the key generation). We say that a dual form signature is A-I matching 
if for all probabilistic polynomial-time (PPT) algorithms, A , there exists a 
negligible function, negl( A), in the security parameter A such that: 

Pr[A° A {W K) 0 V/] = negl(X). 

This property guarantees that if an attacker is only given oracle access to 
Sign 4 , then it is hard to create anything but a Type I forgery. 

3 Here we will assume that the Verify algorithm is deterministic. If we consider a 
nondeterministic Verify algorithm, we could simply take the subset of ordered pairs 
that are accepted by Verify with non-negligible probability. 
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B-II Matching : Let Os be an oracle for the algorithm Sign B (which takes in 
a message and outputs a signature that is identically distributed an output 
of the Sign B algorithm). We say that a dual form signature is B-II matching 
if for all PPT algorithms, A: 

Pr[A° B (VK) f V//j = riegl(X). 

This property guarantees that if an attacker is only given oracle access to 
Sign B , then it is hard to create anything but a Type II forgery. 

Dual- Oracle Invariance (DOI): First we define the dual-oracle security 
game. 

1. The key generation algorithm is run, producing a verification key VK 
and a secret key SK. 

2. The adversary, A, is given the verification key VK and oracle access to 
£>o = Sign A (-) and O x = Sign B (-). 

3. A outputs a challenge message, m. 

4. A random bit, b <— {0, 1}, is chosen, and then a signature a <— Ob(m) is 
computed and given to A. We call a the challenge signature. 

5. A continues to have oracle access to Go and 0\ . 

6. A outputs a forgery pair (m*,cr*), where A has not already received a 
signature for m* . 

We say that a dual form signature scheme has dual-oracle invariance if, for all 
PPT attackers A , there exists a negligible function, negl(X), in the security 
parameter A such that 

\Pr[(m* ,a*) G Vi\b =»'lj — Pr[(m*,cr*) G Vi\b = 0]| = negl( A). 

We say that a dual form signature scheme is secure if it satisfies all three of these 
security properties. 

Secure Signature Scheme. Once we have developed a secure dual form signa- 
ture system, (KeyGen DF , Sign A F , Sign^, Verify DF ), this system immediately 
implies a secure signature scheme. The secure scheme is constructed as follows: 

Construction 1 . KeyGen =KeyGen DF , Sign = Sign FF , Verify = Verify 0 F . 

Our new secure signature scheme is identical to the dual form system except 
that we have arbitrarily chosen to use Sign A as our signing algorithm. We could 
have equivalently elected to use Sign B . (In which case, we would modify the 
dual-oracle invariance property to be with respect to Type II forgeries instead 
of Type I forgeries. Alternatively, we could strengthen the property to address 
both forgery types.) Now we will prove that this signature scheme is secure. 

In the full version, we prove (the argument is rather straightforward): 

Theorem 1. If II = (KeyGen DF , Sign FF , Sign^ F , Verify DF ) is a secure dual 
form signature scheme, then Construction Qfy (KeyGen DF , Sign FF , Verify DF ) 
is existentially unforgeable under an adaptive chosen message attack. 
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3 Background on Composite Order Bilinear Groups 

Composite order bilinear groups were first introduced in jUj] . We define a group 
generator Q, an algorithm which takes a security parameter A as input and 
outputs a description of a bilinear group G. In our case, we will have Q output 
(. N = P1P2P3, G, Gf , e) where pi,p2iP3 are distinct primes, G and Gt are cyclic 
groups of order N = P1P2P3, and e : G 2 — > G t is a map such that: 

1 . (Bilinear) Mg, he G, a,b e Z N , e(g a , h b ) = e(g, h) ab 

2 . (Non-degenerate) 3 g e G such that e(g,g) has order N in Gt- 

Computing e(g, h) is also commonly referred to as “pairing” g with h. 

We assume that the group operations in G and Gt as well as the bilinear 
map e are computable in polynomial time with respect to A, and that the group 
descriptions of G and Gt include generators of the respective cyclic groups. 
We let G Pl , G P2 , and G P3 denote the subgroups of order pi,p2, and P3 in G 
respectively. We note that when h t e G Pi and hj e G P) for i ^ j, e{hi , hj ) is the 
identity element in Gt- To see this, suppose we have hi e G Pl and 62 e G P2 . Let 
g denote a generator of G. Then, g p lP2 generates G P3 , g piP3 generates G P2 , and 
gP2P3 generates G Pl . Hence, for some 01,02, hi = {g P2P 3 ) ai and /12 = (g PlP3 ) a2 . 
Then: 

e(hi,h 2 ) = e(g P2P3ai ,g PlP3a2 ) = e (g ai , g p30l2 ) pip2p3 = 1. 

This orthogonality property of G Pl , G P2 , G P3 is a useful feature of composite 
order bilinear groups which we leverage in our constructions and proofs. 

If we let gi,g2,93 denote generators of the subgroups G Pl , G P2 , and G P3 
respectively, then every element h in G can be expressed as h = g ( {g b g;i for some 
a,b,c e Zjv- We refer to <7“ as the “G Pl part” or “G P1 component” of h. If we 
say that an h has no G P2 component, for example, we mean that 6=0 mod p2- 
Below, we will often use g to denote an element of G Pl (as opposed to writing 
fli). 

The original Camenisch-Lysyanskaya scheme and BGOY identity-based se- 
quential aggregate signature scheme both use prime order bilinear groups, i.e. 
groups G and Gt are each of prime order q with an efficiently computable bi- 
linear map e : G 2 — > Gt- 

4 Camenisch-Lysyanskaya Signatures 

Now we use the dual form framework to prove security of a signature scheme sim- 
ilar to the one put forward by Camenisch and Lysyanskaya m ■ The Camenisch- 
Lysyanskaya signature scheme was already shown to be secure under the LRSW 
assumption. However, the scheme can be naturally adapted to our framework, 
allowing us to prove security under static, non-interactive assumptions. Our re- 
sult is not strictly comparable to the result under the LRSW assumption because 
our signature scheme is not identical to the original. However, this is the first 
proof of security for a scheme similar to the Camenisch-Lysyanskaya signature 
scheme from static, non-interactive assumptions. 
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Our signature scheme will use bilinear groups, G and G t, of composite order 
N = P1P2P3, where p>\ , p-2- and pn are all distinct primes. Our construction is 
identical to the original Camenisch-Lysyanskaya signature scheme in the G Pl 
subgroup, but with additional components in the subgroups G P2 and G P3 . The 
signatures produced by the Sign 4 algorithm will have random components in 
G P3 and components in G P2 which mirror the structure of the scheme in G Pl . 
The signatures produced by the Sign B algorithm will have random components 
in both G P2 and G ps- Type I forgeries are those that are distributed exactly 
like Sign 4 signatures in the G P2 subgroup, while Type II forgeries encompass all 
other distributions. 

To prove dual-oracle invariance, we develop a backdoor verification test that 
the simulator can use to determine the type of the attacker’s forgery. We lever- 
age the fact that the simulator will know the discrete logarithms of the public 
parameters, which will allow it to strip off the components in G Pl in the forgery 
and check the distribution of the G P2 components. This check will fail to de- 
termine the type correctly only with negligible probability. In more detail, we 
create a simulator which must solve a subgroup decision problem and ascertain 
whether an element T is in G PlP3 or in the full group G. It will use T to create 
a challenge signature which is either distributed as an output of the Sign 4 al- 
gorithm or as an output of the Sign B algorithm, depending on the nature of T. 
It will be unable to determine the nature of this signature for itself because this 
will fall into the negligible error space of its backdoor verification test. When the 
simulator receives a forgery from the attacker, it will perform the backdoor ver- 
ification test and correctly determine the type of the forgery, unless the attacker 
manages to produce a forgery for which this test fails. This will occur only with 
negligible probability, because the attacker will have only limited information 
about the error space from the challenge signature, and it needs to forge on a 
different message. This is possible because the public parameters are in G Pl , and 
so reveal no information about the error space of the backdoor test modulo p2- 
We use a pairwise independent argument to show that the limited amount of 
information the attacker can glean from the challenge signature on a message m 
is insufficient for it to produce a forgery for a different message m* that causes 
the backdoor test to err. 

4.1 Our Dual Form Scheme 

Key Gen (A): The key generation algorithm chooses two groups, G = ( g } and 
G t, of order N = pp pppn (where pp , P2, and p.3 are all distinct primes of 
length A) that have a non-degenerate, efficiently computable bilinear map, 
c:GxG 4 G t- It then selects uniformly at random g £ G Pl , <73 £ G P3 , 
92,3 € G P2P3 , and x, y, x e , y e £ Z N . It sets 


SK = ( x,y,x e ,y e ,g 3 ,g2,3 ), 


and 


PK = (N,G,g,X = g x ,Y= g y). 
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Sign 4 (SK. to): Given a secret key (x. y, x; e , y e , <7.3 , c/2,3) - a public key (N,G,g,X,Y), 
and a message to £ Z* N . the algorithm chooses a random r,r' £ Z,v, i? 2) 3 € 
G P2P3 , and random A3 , R' ?J , and R 3 € G P3 , and outputs the signature 

a = (g r R r 2 [ 3 R 3 , (g r ) y (K^^ (g r ) x+mxy (R r 2 [ 3 ) x ° +mx ° y °RZ). 

Note that the random elements of G P3 can be obtained by raising g 3 to 
random exponents modulo N. Likewise, the random elements of G P2P3 can 
be obtained by raising <72,3 to random exponents modulo N. The random 
exponents modulo N will be uncorrelated modulo p 2 and modulo p 3 by the 
Chinese Remainder Theorem. 

Sign s (SK, to): Given a secret key (x, y, x e , y e , g 3 , <72,3), a public key (N, G ,g,X,Y), 
and a message m £ Z^, the algorithm chooses a random r £ Zjv and random 
R'2,3, R 2t3 , and R 2 3 £ G P2P3 , and outputs the signature 

a = (g r R 2 , 3 , (g r ) y R' 2 , 3 , {g r ) x+mxv R 2>3 ). 

The random elements can be generated in the same way as in Sign 4 . 
Verify(VK, to, a): Given a public key pk = (N, G, g, X, Y), message to ^ 0, and 
a signature a = {cri,a 2 ,a 3 ), the verification algorithm checks that: 

e((Ji,g) ^ 1 

(which ensures that ui 0 G P2P3 ), and 

e(cri,y) = e(g,a 2 ) and e(X,ai) ■ e(X,cr 2 ) m = e(g,a 3 ). 

As in the original CL scheme, messages must be chosen from 7 j* n , so that to ^ 0. 

If we allow to = 0, then an adversary can easily forge a valid signature using the 
public key elements ( g , Y, X). Also like the original scheme, the Verify algorithm 
will not accept a signature where all the elements are the identity in G Pl . It 
suffices to check that the first element is not the identity in G Pl and that the 
other verification equations are satisfied. If <r\ is the identity in G Pl , then it will 
be an element of the subgroup G P2P3 . To determine if o\ £ G P2P3 , we pair o\ 
with the public key element g under the bilinear map and verify that it does 
not equal the identity in G t- Without this check, a signature where all three 
elements are members of the subgroup G P2P3 would be valid for any message 
with the randomness r' = 0 mod p-\ . 

Notice, until Sign 4 is called, no information about the exponents x e and 
y e is given out. Once Sign 4 is called, these exponents behave exactly like the 
secret key exponents x and y. except in the G P2 subgroup. These exponents 
will be used to verify that a forgery is of Type I. The additional randomization 
with the G P3 elements guarantees that there will be no correlation in the G P3 
subgroup between the three signature elements. Unlike the signatures given out 
by the Sign A algorithm, signatures from the Sign B algorithm will be completely 
randomized in the G P2 subgroup as well. 
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Forgery Classes. We will divide verifiable forgeries according to their correlation 
in the G P2 subgroup, similar to the way we have defined the signatures from the 
Sign A and Sign B algorithms. We let 2 be an exponent in Zjy. By the Chinese 
Remainder Theorem, we can represent z as an ordered tuple (zi,Z2, £3) G Z Pl x 
Z P2 x Z P3 , where z\ = z modpi, Z2 = z modp2, and Z3 = z modp3. Letting 
(zi, Z2, Z3) = (0 modpi, 1 modp2,0 mod p 3 ) and 92 be a generator of G P2 , we 
define the forgery classes as follows: Type I forgeries are of the form V/ = 
{(to*, cr*) G V|K) 2 = gi,(u lY = 92 Ve ■,{<?%)* = 9 r 2 (Xe+m XeVe) for some r'}, 
while Type II are of the formV// = {(m*,a*) G V|(m*, cr*) £ V/}. 

Essentially, Type I forgeries will be correlated in the G P2 subgroup exactly 
in the same way as they are correlated in the G Pl subgroup, with the expo- 
nents x e and y e playing the same role in the G P2 subgroup that x and y play 
in the G Pl subgroup. Type I forgeries will align with the Sign^ algorithm, to 
guarantee that our scheme is A-I matching. Type II forgeries include any other 
verifiable signatures, i.e. those not correctly correlated in the G P2 subgroup. Un- 
like the signatures produced by the Sign B algorithm, Type II forgeries need not 
be completely random in the G P2 subgroup. However, we will show in our proof 
of security that this is enough to guarantee B-II matching. 

4.2 Complexity Assumptions 

We now state our complexity assumptions. We let G and G t denote two cyclic 
groups of order N = P1P2P3, where pi , P2, and p-s are distinct primes, and 
e : G 2 — > Gt is an efficient, non-degenerate bilinear map. In addition, we will 
denote the subgroup of G of order P1P2 as G PlP2 , for example. 

The first two of these assumptions were introduced in EH, where it is proven 
that these assumptions hold in the generic group model, assuming it is hard to 
find a non-trivial factor of the group order, N. These are specific instances of the 
General Subgroup Decision Assumption described in j^j. The third assumption 
is new, and in the full version we prove that it also holds in the generic group 
model, assuming it is hard to find a non-trivial factor of the group order, N. 

Assumption 4.1. Given a group generator Q, we define the following distribu- 
tion: 

(N = P1P2P3, G, G t , e) -e- Q , 

g,x x £ g pi ,x 2 £g P 2 ,x 3 £g P3 

D = (Ar,G,GT,e,5,XiX 2 ,X 3 ) 

Ti £g p 1 P 2 ,t 2 £g p1 

We define the advantage of an algorithm, A, in breaking Assumption IT1 to be: 

AJP(A) := | Pr[A{D,Ti) = 1] - Pr[A{D,T 2 ) = 1]||. 

Definition 1. We say tha t Q satisfies Assumption n if for any polynomial 
time algorithm, A, Adi\^\X) is a negligible function of X. 
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Assumption 4.2. Given a group generator G, we define the following distribu- 
tion: 

(N = PiP 2P3,G, G t , e) £ G, 

9, Xi £ G Pl , X 2 , Y 2 £ G P2 ,X 3 ,Y 3 £ G P3 , 

D = (N,G,G T ,e,g,X 1 X 2 ,X 3 ,Y 2 Y 3 ), 

Ti £g,t 2 £g P iP3 

We define the advantage of an algorithm, A, in breaking Assumption U-14 to be: 

AJP(A) := \Pr[A(D t Ti.) = 1] - Pr[A{D,T 2 ) = 1]|. 

Definition 2. We say tha t Q satisfies Assumption El if for any polynomial 
time algorithm, A, Ad^^{ A) is a negligible function of X. 

Assumption 4.3. Given a group generator G, we define the following distribu- 
tion: 

(N = pip 2 p 3 , G, G T , e ) -e- G , 
a,r£z N ,g£G Pl , X 2 ,X', X'f, Z 2 £ G P2 , X 3 £ G P3 , 

D — (-AT, (Gr, Gt? e, <7, g a , g r X 2, g ra X^ g ra X!f, ^2? -^3)? 

We define the advantage of an algorithm , A, in breaking Assumption ^. A to be: 

Ad4jff\) := Pr[A(D) = ( g r “ 2 R 3 ,g r R 3 ) and r' ^ 0 modpi], 
where R 3 and R 3 are any values in the subgroup G P3 . 

Definition 3. We say tha t G satisfies Assumption 14 ■ A if for any polynomial 
time algorithm, A, AdJf^A) is a negligible Junction of X. 

Proof of Security. In the full version, we prove that our signature scheme is 
secure under these assumptions by proving that it satisfies the three properties 
of a secure dual form signature scheme. 

5 BGOY Signatures 

Here we give a public key variant of the BGOY signatures and prove existential 
unforgeability using our dual form framework. In the full version, we show how 
this base scheme can be built into an identity-based sequential aggregate signa- 
ture scheme and reduce the security of the aggregate scheme to the security of 
this base scheme, in the random oracle model. We will also employ the random 
oracle model in our proof for the base scheme, although this use of the random 
oracle can be removed (see below for discussion of this). 

Our techniques here are quite different than those employed for the BB- 
derived and CL signature variants, and they reflect the different structure of 
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this scheme. There are some basic commonalities, however: we again employ 
a bilinear group of order N = P1P2P3, and the main structure of the scheme 
occurs in the G Pl subgroup. The signatures produced by the Sign A algorithm 
contain group elements which are only in G Pl , while the signatures produced by 
the Sign B algorithm additionally have components in G P3 . These components 
in G P3 are not fully randomized each time and do not occur on all signature 
elements: they occur only on three signature elements, and the ratio between 
two of their exponents is the same for all Sign B signatures. Our forgery types 
will be defined in terms of the subgroups present on two of the elements in the 
forgery. 

We design our proof to reflect the structure of the scheme, which essentially 
combines a one-time signature with a mechanism to prevent an attacker from 
producing new signatures from linear combinations of old signatures in the ex- 
ponent. In proving dual-oracle invariance, we leverage these structures by first 
changing the challenge signature from an output of Sign A to a signature that 
has components in G P2 , and then changing it to an output of Sign B . It is cru- 
cial to note that as we proceed through this intermediary step, the challenge 
signature is the only signature which has any non-zero components in G P2 . This 
allows us to argue that as we make this transition, an attacker cannot change 
from producing Type I forgeries (which do not have G P2 components on certain 
elements) to producing forgeries which do have non-zero G P2 components in the 
relevant locations. Intuitively, such an attacker would violate the combination 
of one-time security and inability to combine signatures, since the attacker has 
only received one signature with G P2 elements, and it cannot combine this with 
any other signatures to produce a forgery on a new message. These aspects seem 
hard to capture when working directly in a prime order rather than compos- 
ite order group. (We note, however, that the one-time aspect was also implicit 
in the security proof of the Gentry-Ramzan scheme f 3 (!l on which the BGOY 
scheme was based; however, differences in the schemes prevent capturing it in 
the same way for the latter.) The techniques here are also quite different from 
those used in our proofs for CL and BB-derived signatures: here there is no 
backdoor verification test or pairwise-independence argument. 

5.1 The Dual Form Scheme 

KeyGen(X) — > VK, SK The key generation algorithm chooses a bilinear group 
G of order N = piP2p3- It chooses two random elements <7, k 6 G Pl , random 
elements <73,(73 6 G P3 , and random exponents 01,02,61,62, £*i • 012 , pj , ,8-2 G Zjv- 
It also chooses a function H : { 0 , 1 }*— 7 -Zjv which will be modeled as a random 
oracle. It sets the verification key as 

VK := {N, H,G,g,k,g ai ,g a2 ,g bl , g b * , g a 1 , g a > , g^ , g * } 
and the secret key as 

SK := {N, H, G, g, k, g a 102 , g b ' b * , g a ia * , g^ , <73, <? 3 d }. 
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Sign A (m, SK) — » a The Sign^ algorithm takes in a message m £ {0,1}*. It 
chooses two random exponents ri,r2 € Zjv, and computes: 


:= g WM*H(m) g nr^ U2:=g H y 

<74 := k r \ (7 5 := g<'l<*2+AihH(.m) k n 


It outputs the signature a := (04, (72, <73, <74, <75). 


Sign B (m, SK) — > a The Sign n algorithm takes in a message m £ {0,1}*. It 
chooses two random exponents n, r 2 , x, y £ Zn, and computes: 

ai , = gaiat+b^Him) g r ir 2 g x, ^ —g^g^ (j 3 : =fl r r2 , 

(74 := fc r2 , <75 := g °‘ i a *+ Pifc H ( m ) k rir2 ( g $) x . 

It outputs the signature a := (04, 02, 03, <74, <75). 

Verify (to, <7, VK) — » {True, FaZse} The verification algorithm first checks that: 

e(<7i,s0 = e(g ai ,g a2 )e(g bl ,g b2 ) H< - m) e(a 2 ,a 3 ). 

It also checks that: 

e(o- 5 ,sO = e{g oll ,g ol2 )e{g^\g^ 2 ) H{m) e{a 2 ,(7 A ). 

Finally, it checks that: 

e(<7, cr 4 ) = e(k, cr 3 ). 

If all of these checks pass, it outputs “True.” Otherwise, it outputs “False.” 

We note that the use of the random oracle H to hash messages in {0, 1}* to 
elements in Zn in this public key scheme that forms the base of our identity- 
based sequential aggregate signatures is not necessary, and can be replaced in 
the following way. Instead of using we can assume our mes- 

sages are n-bit strings (denoted mirri2 . . . m n ) and use g a ° b ° n"=i g miaibi ■ Here, 
g a ° , . . . , g a ' n ,g b °,..., g hn will be in the public verification key. In the proof, in- 
stead of guessing which random oracle query corresponds to the challenge mes- 
sage, the simulator will guess a bit which differs between the challenge message 
and the message that will be used in the forgery. This guess will be correct with 
non-negligible probability. However, the use of the random oracle model to prove 
security for the full identity-based sequential aggregate scheme is still required. 
Removing the random oracle model altogether remains an open problem. 


Forgery Classes. We will divide the forgery types based on whether they have 
any G P2 or G P3 components on aq or <75. We let z 2 € Zjv denote the exponent 
represented by the tuple (0 mod pi, 1 mod p 2 , 0 mod p 3 ), and we let z 3 £ Z N 
denote the exponent represented by the tuple (0 mod pi, 0 mod p 2 , 1 mod p 3 ). 
Then we can define the forgery classes as follows. Type I forgeries are of the 
form Vi = {(to*, (7 *) £ V|(cr{) Z2 = l,(cr{) Z3 = 1 and (ug) 22 = 1, (cr^) 23 = 1}, 
while Type II are of of the form V// = {(to*,ct*) £ V|(<7}) 22 7^ 1 or (05 ) 22 7^ 
1 or (<7* ) 23 7^ 1 or (o|) 23 7^ 1}. 
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In other words, Type I forgeries have cr* . cr| g G Pl , while Type II forgeries 
have a non-zero component in G P3 or G P2 on at least one of these terms. We 
note that these types are disjoint and exhaustive. 

We state our complexity assumptions and prove security of this scheme in the 
full version. Some the assumptions we employ were previously used in j4H42j . 
Those that are new are justified in the generic group model. 
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Abstract. In this paper, we discuss solving the DLP over GF( 3 6 ' 97 ) 
by using the function field sieve (FFS) for breaking paring-based cryp- 
tosystems using the rjr pairing over GF(3 97 ). The extension degree 97 
has been intensively used in benchmarking tests for the implementation 
of the r?T pairing, and the order (923-bit) of GF( 3 6 ' 97 ) is substantially 
larger than the previous world record (676-bit) of solving the DLP by 
using the FFS. We implemented the FFS for the medium prime case, 
and proposed several improvements of the FFS. Finally, we succeeded in 
solving the DLP over GF( 3 6 ' 97 ). The entire computational time requires 
about 148.2 days using 252 CPU cores. 

Keywords: pairing-based cryptosystems, Tfr pairing, discrete logarithm 
problems, function filed sieve. 


1 Introduction 

After the advent of the tripartite Diffie-Hellman (DH) key exchange scheme 
and ID-based encryption using pairing 0 , plenty of attractive pairing-based 
cryptosystems have been proposed, for example, short signature |jj|, keyword 
searchable encryption m , efficient broadcast encryption (3, attribute-based 
encryption 0 , and functional encryption 0 - Pairing-based cryptosystems 
have become a major research topic in cryptography. 

Pairing-based cryptosystems are constructed on the groups G i, G\ and 
of the same order with a bilinear pairing G\ X G[ — > G^. The security of 
pairing-based cryptosystems is based on the difficulty in solving several number- 
theoretic problems such as the computational/decisional bilinear DH problem 
(CBDH/DBDH), strong DH problem (SDH), decisional linear problem (DLIN), 
and symmetric external DH problem (SXDH). However, the most important 
number-theoretic problem in pairing-based cryptosystems is the discrete loga- 
rithm problem (DLP) on Gi, G \ , and Gi. All the other number-theoretic prob- 
lems above are no longer intractable once the DLP on G \ , G \ , or G 2 is broken. 
Therefore, it is important to investigate the difficulty in solving the DLP. 
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Table 1 . Summary of time data for solving DLP over GF( 3 6 ' 97 ) 


phase 

method 

time 

machine environment 

collecting relations 
linear algebra 

individual logarithm 

lattice sieve 
parallel Lanczos 
rationalization and 
special-Q descent 

53.1 days 

80.1 days 

15.0 days 

212 CPU cores 

252 CPU cores 

168 CPU cores 

total 


148.2 days 

252 CPU cores 


One of the most efficient algorithms for implementing the pairing is the r)T pair- 
ing 0] defined over a supersingular elliptic curve E on the finite field GF( 3"), 
where n is a positive integer. Since the embedding degree of E is 6, the r/r pair- 
ing can reduce a DLP over E on GF( 3"), which is called an ECDLP, to a DLP 
over GF( 3 6 "). Joux proposed the (probably) first cryptographic scheme uni that 
uses the pairing over E. Boneh et al. then applied the pairing over E to the short 
signature scheme m , where a point (a;, y) on E for extension degree n = 97 can 
be represented as a signature value, e.g., x = KrpIcV009CJ8iyBS8MyVkNrMyE. At 
CRYPTO 2002 j Barreto et al. presented algorithms for efficiently computing Tate 
pairing over E [fj] . Many high-speed imp lementations of pairing over E have sub- 
sequently been proposed 13 , 0 - 0,0 18 , 03 ]. For many of these implementations, 
benchmark tests using the extension degree n = 97 have been conducted. There- 
fore, we focus on the DLP over finite field GF( 3 6 ' 97 ) in this paper. The cardinality 
of the subgroup of the supersingular elliptic curve is 151 bits, and that of GF( 3 6 ' 97 ) 
is 923 bits. The size of our target DLP is 247 bits larger than the previous world 
record of solving the DLP over GF( 3 6 ' 71 ), whose cardinality is 676 bits a The 
current world record for solving an ECDLP is the 11 2- bit ECDLP a Pollard’s 
p method is used for solving the 112-bit ECDLP, and has not reached the ability 
for solving the 151-bit ECDLP over the subgroup of E. 

In this paper, we analyze the difficulty in solving the DLP over GF( 3 6 ' 97 ) by us- 
ing the function field sieve (FFS), which is known as the asymptotically fastest al- 
gorithm m . Since the FFS proposed by Joux and Lercier ( JL06-FFS) m is suit- 
able for solving the DLP over a finite field whose characteristic is small, we use the 
JL06-FFS and propose several efficient techniques for increasing its speed. Note 
that the FFS generally consists of four phases: polynomial selection, collecting re- 
lations, linear algebra, and individual logarithm, and the time-consuming phases 
are collecting relations and linear algebra. For the collecting relations phase, we 
applied several techniques; lattice sieve for the JL06-FFS, lattice sieve with sin- 
gle instruction multiple data (S1MD), and optimization for our parameters. These 
techniques enable the sieving program to run about 6 times faster. In the linear al- 
gebra phase, we applied careful treatments of singleton- clique and merging 0 to 
the Galois action originating from extension degree 6 of GF( 3 6 ' 97 ), with which the 
size of the matrix used for the Lanczos method is reduced to approximately 30%. 
By implementing the JL06-FFS with our improvements, we succeeded in solving 
the DLP over GF(3 6 ' 97 ) by using 252 CPU cores (Core2 quad, Xeon, etc) for the 
target problem discussed in Section 13.11 As shown in Table 0 the computations 
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required 53.1 days for the collecting relations phase, 80.1 days for the linear alge- 
bra phase, and 15.0 days for the individual logarithm phase. Thus, a total of 148.2 
days were required to solve the DLP over GF( 3 6 ' 97 ) by using 252 CPU cores. Our 
computational results contribute to the secure use of pairing-based cryptosystems 
with the t?t pairing. 

2 Pairing-Based Cryptosystems and Discrete Logarithm 
Problem (DLP) 

In this section, we briefly explain the security of pairing-based cryptosystems 
and give a general overview of the function field sieve (FFS). We also mention 
its parameters such as the smoothness bound B. 

2.1 Pairing-Based Cryptosystems and DLP 

Many efficient crypto g rap hic p rotocols using a bilinear pairing have been pro- 
posed (for example flCHKl l2U l2Sj|h and high-speed implementations for the rjr 
pairing have been reported (for example |3, 0-0, LL~. LLa, Ej])- We discuss the 
security of pairing-based cryptosystems with the t]t paring over GF( 3") for an 
integer n. The security of pairing-based cryptosystems with the T)t paring de- 
pends on the difficulty in solving the DLP over the supersingular elliptic curves. 
Additionally, MOV reduction j23| reduces this problem to a DLP over GF( 3 6 ")* 
since the embedding degree of the rjT pairing is 6. 

In particular, the t]t pairing is a bilinear map such that r]T : G± x G\ — > G 2 , 
where G\ is an additive subgroup of a supersingular elliptic curve over GF( 3"), 
G 2 is a cyclic subgroup of GF( 3 6 ")*, and the cardinalities of G\ , G 2 are the same 
prime number P. The security of pairing-based cryptosystems with the t]t pair- 
ing depends on the difficulty of not only an ECDLP over G 1 but also a DLP over 
G 2 by MOV reduction. To explain this fact, we take ID-based encryption con- 
structed on pairing-based cryptosystems as an example. The ID-based encryp- 
tion has a master key Sk ey € TLp. Each user ID is deterministically transformed 
into a point Qjd G G 1 , and the secret key Sid is defined by [sk ey \QiD- Therefore, 
solving the ECDLP over G\, namely Sm = [sk ey ]QiD, we obtain the master key 
Skey = logQjo $id ■ Additionally, for an arbitrary point 1Z 6 G\. we compute 
Vt(Sid ,F) ,t)t{Qid ,F) € O 2 , and then have i , ]t(Sid,'R-) = T]T([skey]QiD,'F,) = 
riT{QiD,F) Ske y G G 2 . This implies that Sk ey = logj 7 T (Q JD , 7 ?.) Vt{Sid,F.) is also 
available by solving the DLP over G 2 . In this paper, we discuss the DLP over a 
subgroup of GF(3 6 ")*. 

2.2 General Overview of FFS 

The FFS is the asymptotically fastest algorithm for solving a DLP over finite 
fields of small characteristics. Adleman proposed the first FFS in 1994 [ 3 ]. After 
that, several variants of the FFS have been proposed; Adleman and Huang im- 
proved the FFS 0|, and Joux and Lercier proposed two more practical FFS’s, 
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JL02-FFS 0 and JL06-FFS 0. The details of JL06-FFS are explained in 
Sections Id. 21 

In this section, we give a general overview of an FFS that consists of four 
phases: polynomial selection, collecting relations, linear algebra, and individ- 
ual logarithm. In the overview, we aim at computing log ff T where T G (g) C 
GF(3 6n )*. 

Polynomial Selection Phase: We select k from k = 1, 2, 3, 6 for the coefficient 
field of GF(3 K )[a;], and a bivariate polynomial H(x. y) G GF(3 K )[x,y\ such that 
H satisfies the eight conditions proposed by Adleman 0| and deg y H = d R for 
a given parameter value d R - We compute a random polynomial m G GF(3 K )[x] 
of degree d m and a monic irreducible polynomial / G GF(3 K )[x] such that 

H(x,m) = 0 (mod/), deg / = 6n//c. (1) 

We then have GF( 3 6 ”) = GF(3 K )[x]/(f). Moreover, there is a surjective homo- 
morphism 

^ , | GF{ 3")[x, y]/(H) -> GF( 3 6 ") a GF{3«)[x\/{f) 

We select a positive integer B as a smoothness bound, and define a rational 
factor base F R (B) and an algebraic factor base Fa(B ) as follows. 

F r (B) = (p G GF(3 K )[x] | deg(p) < B, p is monic irreducible}, (2) 
F A (B) = {<p ,y- t ) G Div(GF(3«)[x,y\/(H)) \ ^ 

p G F r (B), H(x, t) = 0 mod p}, ^ 

where Div(GF(3 K )[a;, y\/(H)) is the divisor group of GF(3 K )[x, y\/{H) and 
(p, y — t) is a divisor generated by p and y—t. Note that F R (0) = F A { 0) = {0}. 
We simply call the set F R (B) U F A (B) a factor base and the set F R (k)\F R (k — 
1) U F A (k)\F A (k — 1) a factor base of degree k for k — 1 , 2 , . . . , B. 

Collecting Relations Phase: We select positive integers R, S and collect a 
sufficient amount of pairs (r, s ) G (GF(3 re )[a:]) 2 such that 

degr < R, degs < S, gcd(r, s) = 1, 
rm + s= JJ p“% 

P ieF R (B) 

(ry + s}= 

<P j,v~tj)£ F A(B) 

for some non- negative integers ai,bj by using a sieving algorithm such as the 
lattice sieve discussed in Section EJ To efficiently compute bj in ©• we use the 
following equivalent property instead of © : 

(—r) dH H(x, —s/r) = P?- 

<P j,y-tj)£ F A{B) 


(4) 

(5) 

( 6 ) 


(7) 
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The (r, s) satisfying (0), (0), and (0) is called a B-smooth pair. Let h be the class 
number of the quotient field of GF(3 K )(x)[y]/ (H) and assume that h is coprime 
to (3 6 " — 1)/(3 K — 1). Then the following congruent holds: 

Y a i l°g s Pi — Y b j l °gg s i (mod (3 6 ” - 1)/(3 K - 1)), (8) 

p i£F R {B) (pj,y-tj)eF A (B) 

where Sj = £(t j) l ^ h , (tj) = h{ pj, y — fy). We call the congruent (0) “relation” 
in this paper. Moreover, free relation j2£| provides additional relations without 
computation with a sieving algorithm. 

Linear Algebra Phase: We generate a system of linear equations described as 
a large matrix from those collected relations and reduce the rank of the matrix by 
filtering 0- The reduced system of linear equations is solved using the parallel 
Lanczos method 1,0 or other methods, and the discrete logarithms of elements 
in the factor base are obtained: 

log s pi,..., log ff p #FR {B ) , log ff si , • ■ ■ , log ff s #Fa (b) . 


Individual Logarithm Phase: Note that our goal is to compute log 9 T. There- 
fore, we find integers a,i,bj using the special-Q descent 0 such that, 

log g T = Y, a i l°gg Pi + Y, l°g s Sj (mod (3 6 " — 1)/(3 K — 1)). 

P ieF R (B) ( Vj ,y-tj)eF A (B ) 

The computational time for the individual logarithm phase is smaller than those 
for the collecting relations and linear algebra phases. 

3 Target Problem for n = 97 and Setting of Parameters 
for FFS 

We discuss solving the DLP over a subgroup of GF( 3 6 ' 97 )*, where the cardinality 
of the subgroup is 151 bits. To estimate the time complexity of solving such 
a DLP, we unintentionally set a target problem determined from the circular 
constant ir and natural logarithm e. The details are explained in Section 13.11 
To solve the target problem effectively, we select the parameter values of the 
FFS and estimate important numbers, e.g., the number of elements in the factor 
base, for it. The details are given in Section liT~21 

3.1 Target Problem 

For pairing-based cryptosystems, many high-speed implementations of the nr 
pairin g oy er supersingular elliptic curves on GF( 3 n ) have been reported ( 3 , 0 - 0 , 
11 71 lid . l23| , and many benchmark tests using the r)r pairing have been conducted 
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for GF( 3 97 ). In this paper, we deal with a supersingular elliptic curve defined 
by 


E := {(x,y) £ GF( 3 97 ) 2 : y 2 = x 3 - x + 1} U {O}, 

where O is the point at infinity. The order of the E is 3 97 + 3 49 + 1 = 7 Pi 5 i 
where Pi 51 is a 151-bit prime number as follows: 

P 151 = 2726865189058261010774960798134976187171462721. 

Next, let Gi be the subgroup of E of order Pi 5 i and let G 2 be the subgroup 
of GF( 3 6 ' 97 )* of order P 151 . Note that, since orders of G\ and G 2 are prime 
numbers, every element of Gi\{(9} and G 2 \{ 1 } is a generator of Gi and G 2 , 
respectively. The t/t pairing for n = 97 is a map from G\ X G\ to G 2 . 

Our goal is to solve the ECDLP in Gi. To set our target problem uninten- 
tionally, we select two elements Q n , Q e in Gi, which correspond to the circular 
constant tt and natural logarithm e, respectively. We explain how we select Q v 
and Q e as follows. First, we describe GF(3 97 ) as GF(3)[x]/(x 97 + x 16 + 2), 
where the irreducible polynomial x 97 + a; 16 + 2g GF(3)[®] is well used for the 
fast implementation of field operations. An element in GF(3 97 ) is represented by 
^i=o d-iX 1 . where d % £ GF( 3) = {0, 1, 2}. To transform tt and e to an element in 
GF(3 97 ) respectively, we define a bijective map <p : 0 d'X 1 >-> 0 € Z. 

We then transform tt and e to the 3-adic integer of 97 digits by [tt • 3 95 J and 
|_e ■ 3 96 J , respectively. 

From these values, we define Q n = (x w , y^) and Q e = (x e . y e ) £ Gi as follows. 
We first find the non-negative smallest 3-adic integers c n and c e such that cjr 1 ( |_7r- 
3 95 J + c^) and (p~ 1 ( [e- 3 96 J + c e ) become x-coordinates of the elements Q n and 
Q e in the subgroup Gi on the F. In fact we can set x n = <jr 1 { |_7r • 3 95 J + (11)3) 
and x e = [e ■ 3 96 J + (120)3). There are two points in Gi\{(9} of the same 
x-coordinate. We then set the corresponding y-coordinate by computing y v = 
(x% — Xjr + l)( 397 +D/ 4 and y e = (xl — x e + 1)( 397+1 )/ 4 in GF(3 97 ), respectively. 

Again, our goal is to solve the ECDLP in Gi, i.e., for given Q n . Q e £ G\ we 
try to find integer s such that Q n = [s]Q e . On the other hand, the tjt pairing 
enables us to reduce the ECDLP in Gi to the DLP over G 2 by the relationship 
t]t{Q-k, Qtt) = t]t(Qk, Qe ) s ■ Therefore, we can find s by computing the discrete 
logarithm 

S = lo &VT(Q*,Qe) Vt(Qk, Qtt) = log fl 7]t(Qk, Q-k )/ log ff T]T ( Qn , Qe) mod P m , 
for a generator g of G 2 . 


3.2 Parameter Settings for FFS 

In this section, we explain the parameter setting used for our implementations 
of the FFS. Hayashi et al. HU reported that, when n < 509, the JL06-FFS HU 
is more efficient for solving the DLP over GF(3 6 ") than the JL02-FFS HU. Thus, 
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we use the JL06-FFS for our computation. In the JL06-FFS, the condition that 
“r is monic” is introduced into the collecting relations phase in order to compute 
efficiently. For the remainder of this paper, the FFS refers to the JL06-FFS. 

To solve our DLP over GF( 3 6 ' 97 ), we have to select several parameter values 
of the FFS, such that its computational time is small enough for a fixed extension 
degree n. The parameter values for n = 97 are listed in 0; Table 3] , and we use 
those parameter values for our computation. 

We select the parameter k G {1,2, 3,6} as follows. GF( 3 6 ' 97 ) is described 
as GF(3 K )[x]/(f), where / G GF(3 K )[x] is an irreducible polynomial of degree 
6-97 /«. The appropriate value of k is given in 0 , Table 3], i.e., k = 6. However, 
we select k = 3 for the following reasons. In the linear algebra phase, filtering 0 
is performed to reduce the size of the matrix. Then it is required that all elements 
in the factor base correspond to the memory addresses of the PC for efficient 
computation. The number of elements in the factor base for k = 6 is much larger 
than that for k = 3, so k = 3 is advantageous on this point. Additionally, |3ll . 
Table3] shows that the computational cost of the FFS for k = 3 is only about 
twice as much as that for k = 6. We conducted test runs for n = 3,6 in the 
collecting relations phase, then noticed that our implementation for k = 3 was 
much faster than for k = 6, so we set k = 3. 

Polynomial Selection Phase: We select the bivariate polynomial H(x,y) of 
the form x + y d " for a given parameter du of the FFS in the same manner as 
0 . Then we search an irreducible polynomial / G GF(3 re )[a;] and a polynomial 
m G GF(3 re )[a;] which are satisfying the condition 0, by factoring H(x. rn) for 
a randomly picked polynomial m whose degree is d m . In fact, we randomly pick 
up m from GF(3)[a;], so that / is also in GF(3)[x] for use of the Galois action. 
From 0 Table 3], we set d R and d rn as 6 and 33, respectively. 

Next, we select the smoothness bound B = 6 by using 0 , Table 3] for 0 
and 0, i.e., a rational factor base F R (B) and an algebraic factor base F A (R). 
#Fr(F) is 67576068 and #Fa(F) is 67572597, thus the number of elements of 
factor base, i.e., #F R (B) + #F A (F), is 135148665. 

Collecting Relations Phase: In the collecting relations phase, we use the 
lattice sieve 0 and the free relation 0 and collect many relations 0 ; (r, s) G 
(GF(3 K )[x]) 2 satisfying 0, 0, 0, where r is monic. The search range for the 
lattice sieve depends on the maximum degrees R, S of r, s. We set R = S = 6 
based on 0 Table 3]. The lattice sieve gives a certain amount of relations 
for one special-Q, which is defined in Section 14.11 Therefore, we require a suf- 
ficient number of special-Q ’s so that the number of relations obtained in the 
collecting relations phase is larger than that of all elements in the factor base. 
The minimum sufficient number of special-Q ’s is estimated by the following 
process. We have to select special-Q’s from the subset F r (6)\Fr(5), whose car- 
dinality is 64566684. Let 9 m i n be the minimum sufficient ratio of special-Q’s 
over all elements in Fr(6)\Fr( 5). For n = 97 and k = 3, we can estimate 
Omin = 0.01292 0, Table 3]. Therefore, the number of special-Q’s must be 
larger than [0.01292 • 64566684] = 834202. In our computation, we set 2500000 
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as the number of special-Q’s to obtain more relations than we require since we 
expect that these excess relations will help us reduce the size of the matrix during 
filtering, especially in singleton-clique. 

4 Implementation 

In this section, we propose the following efficient implementation techniques; 
the lattice sieve for the JL06-FFS and optimization for our parameters in the 
collecting relations phase, the data structure and the parallel Lanczos method for 
the Galois action in the linear algebra phase, for reducing the computational cost 
of the FFS for solving the DLP over GF( 3 6 ' 97 ). Parameters (k, (1h-, d m , B, R, S) 
are fixed as (3, 6, 33, 6, 6, 6). The reasoning for this is explained in Section HOI 

4.1 Collecting Relations Phase 

In the collecting relations phase, we used the lattice sieve M in a similar fashion 
to factoring a large integer j2^] and solving discrete logarithm problems 00. 
We give an overview of our implementation of the lattice sieve in the following 
paragraphs. More details are described in 0- 

Lattice Sieve for JL06-FFS: Sieving with the lattice sieve is performed for 
( r,s ) G (GF(3 3 )[x]) 2 such that the formula Q given in Section is divisible 
by an element Q chosen from a subset of the rational factor base F R (6)\F R (5) 
(this Q is called a “special-Q”). Recall that degr and degs are not greater 
than R = 6 and 5 = 6, respectively. Such (r, s ) can be represented as (r, s ) = 
c(ri, si)+d(r 2 , S 2 ) for given reduced lattice bases (n, si), (r 2 , s 2 ) G (GF(3 3 )[z]) 2 
and any c, d G GF(3 3 )[x] such that deg(cri + dr 2 ) < 6,deg(csi + d.S 2 ) < 6, 
then sieving is done on the bounded c-d plane. After sieving, we conduct the 
smoothness test 0 for “candidates” that are evaluated as R-smooth pairs with 
high probability by using the lattice sieve. 

A problem of applying the lattice sieve to the FFS is the condition “r is 
monic” described in Section EO Since r is represented as cri + dr 2 , it is difficult 
to efficiently keep r monic — it might require degree evaluations and branches. 
Instead of choosing monic r, we introduce the condition r = 1 mod x. To satisfy 
this condition, we restrict r\ and r 2 such that n = 0 mod x and r 2 = 1 mod x. 
Then sieving is performed on the bounded c-d plane with restriction d = 1 mod x, 
whose size is reduced to 1/27 compared with the original bounded c-d plane. This 
sieving procedure with the restricted condition can be implemented without 
extra costs such as additional degree evaluations and additional branches. 

Lattice Sieve with SIMD: Since operations of GF{ 3) can be represented 
using logical instructions 0 , operations of GF(3 3 )[x] can be performed using a 
combination of logical and shift instructions. This means SIMD implementation 
is appropriate for efficient computation of the lattice sieve. We represent GF( 3 3 ) 
as polynomial basis GF(3)[w]/(w 3 — u — 1), and its element is represented using 
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vertical(left) : estimation days for collecting relations phase 
vertical(right) : number of collected relations 

horizontal : first two weeks of computing days for collecting relations phase 
(Period with no data between 8-9 days was due to human error in operating PC.) 

Fig. 1 . Our improvement in collecting relations phase for first two weeks 


6-bit (hif£uhu,£ u ,h u 3,£ u 3) e GF( 2) 6 in our implementation. We then pack 
16 elements of GF(3 3 )[:r] of degree at most 7 into 6 registers of 128 bits, and 
treat 16 elements with SIMD. Note that the upper bound of the degree of our 
SIMD data structure is for efficient access to each element in GF(3 3 )[x]. On the 
other hand, since we choose B, R, S as all 6, the upper bound of the degrees of 
c,d,ri,si,r 2 ,S 2 € GF(3 3 )[:e] and p in the factor base, which are treated in the 
lattice sieve, is also 6. Therefore, our SIMD structure can be stored elements 
treated in the lattice sieve. 

History of Our Optimizations: Figure [I] shows the process of our improve- 
ments in the collecting relations phase for the first two weeks. We improved 
our implementation of the lattice sieve four times during this period. We first 
used large prime variation to omit sieving for the factor base of degree 6 and 
implemented the lattice sieve for the FFS with SIMD implementation. We then 
ran the program for the first four days (stage I in Fig. QJ). At that point, the 
estimated total number of days for the collecting relations phase was about 360 
days. While the sieving program was running, we found that sieving for the 
factor base of degree 5 requires heavier computation than sieving for the factor 
bases of degree 1, 2, 3 and 4. Therefore, we improved sieving for the factor base 
of degree 5; thus, our sieving program became over 3 times faster than before 
(stage n in Fig. |TJ). Next, we optimized register usage for input values and omit- 
ted wasteful computations (stage III in Fig. HJ). Additionally, we omitted sieving 
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for the factor base of degree 1 (stage IV in Fig.QJ), since that computational time 
was larger than that for the factor bases of degree 2, 3, 4, and 5. Moreover, we 
improved our sieving program to use 128-bit registers more efficiently (stage V in 
Fig- d- Finally, our sieving program became about 6 times faster than the first 
one (stage I in Fig. [TJ and the estimated total number of days for the collecting 
relations phase became about 53.1 days. In the next paragraph, we explain the 
details of the improvement in stage II, which is the most effective and important 
improvement in our implementation of the lattice sieve. 

Details of Stage II: In the lattice sieve, the main computation of sieving 
for given lattice bases (ri,si), ( r 2 ,s 2 ) G (GF(3 3 )[ir]) 2 is as follows. For fixed 
d G GF(3 3 )[x}. whose degree is upper-bounded by a degree bound D. we compute 
Co = — d(rit+si)~ 1 (r 2 t+s 2 ) mod p for all pairs (p, t) G { (p, t) | p G F R (B), t = m 
(mod p)} U {(p,t) | (p,y — t) G Fa(B)}, and compute c G GF(3 3 )[:r], whose 
degree is upper-bounded by a degree bound G, such that c = Co + kp where 
k G GF(3 3 )[a;]. We call the computation “sieving at d” in this section. For given 
lattice bases, sieving at d is performed for all d of degree not larger than D. Note 
that co does not need to be computed when (nt + si) = 0 (mod p); therefore 
we assume (r\t + si) ^ 0 (mod p) in the following description. 

In stage I of our implementation, we found that the time of sieving at d for 
degp = 5 takes over 100 msec, but each sieving time at d for degp = 1,2,3 
and 4 takes about 10 mesc or less. Therefore, we tried to improve the sieving 
of degree 5. When we compute Co for p of degree 5, the degree of Co becomes 4 
with probability about 26/27. On the other hand, the degree of the lattice bases 
ri,s 1 ,r 2 ,s 2 is 3 in most cases because the degree of special-Q is 6. On such 
bases, degree bounds C and D can be chosen as 3 to satisfy condition (0), i.e., 
degr < 6 and degs < 6. These facts show that about 26/27 of the computation 
of sieving for p of degree 5 are waste computations. Therefore, we discuss how to 
sieve only with the polynomial Co, whose degree is not larger than 3, as follows. 

Let a G GF(3 3 )[:r] be — (rit+si) -1 (r 2 f-|-S 2 ) mod p, then we have Co = da mod 
p. Let at G GF( 3 3 ) be the coefficient of the fourth-order term of x' a mod p 
for i — 0, 1, 2, 3. Since deg d < 3, d is represented as d 3 x 3 + d 2 x 2 + d\x + 1 for 
d 3 ,d 2 ,di G GF( 3 3 ). Recall that we restricted d = 1 mod x in our implementation 
of the lattice sieve. Here we know that the degree of Co is not larger than 3 if 
d 2 a 2 + d 2 a 2 + d\a% + ao = 0. Therefore, it is sufficient to perform sieving at d 
for p in the factor base of degree 5 for only d satisfying the following property: 





if ai ^ 0 


( 9 ) 


any element in GF( 3 3 ) if cci = 0 and K = 0 


where K = d 2 az + d 2 a 2 + a$. When aq = 0 and K = 0, we should compute 
Co for d whose di is any element in GF( 3 3 ), and we cannot cut off any d -\ ; 
therefore, we assume that a\ ^ 0 in the following description. Suppose that 
we now fix lattice bases (ri, si), (r 2 , s 2 ) and a pair (p, t) where degp = 5, then 
each a, for i = 0, 1,2,3 is also fixed. Therefore, since K depends on d 2 and d 2 , 
the d\ satisfying ® is given by d 2 and d 3 and uniquely determined for given 
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d 2 and d% . This implies that, since dj is in GF( 3 3 ) whose cardinality is 27, we 
can ignore 26 d \ ’s not satisfying (0 for given d 2 and cfy In fact, the time of 
sieving at d for all pairs (p,t) where deg p — 5 is reduced to about 1.5 msec by 
ignoring di not satisfying Q . Note that we need to compute K for given d 2 and 
d 3 for all pairs (p, t). The time of computing K for all (p, t) takes about 150 
msec in our implementation. Therefore, for all pairs (p, t) where degp = 5, the 
computations of K and sieving at d require about 7.1 msec at stage H, which is 
over 10 times faster than the computation of sieving at d at stage I. As a result, 
our implementation of the lattice sieve at stage II becomes over 3 times faster 
than that at stage I. 

4.2 Linear Algebra Phase 

After the collecting relations phase, we obtain a system of linear equations mod- 
ulo P151, which is described in Sectional The Galois action (20, 0 can re- 
duce the number of variables of the system of linear equations to one-third. 
Additionally, after the Galois action, the numbers of equations and variables of 
the system of linear equations can be further reduced using filtering 0, i-e., 
singleton-clique and merging. To solve the system of linear equations defined by 
this reduced matrix, we use the parallel Lanczos method jj, 120] . 

Galois Action: The Galois action to GF(3 6 ' 97 ) /GP(3 3 ' 97 ) enables us to reduce 
the number of variables of the system of linear equations to one-third (details 
of the Galois action are discussed in 00) . However, when we use the Galois 
action, 151-bit large integers such as eo + e \ r + e 2 r 2 , where r = 3 972 mod P151 
and 6i is a small integer of a few bits, are added to elements of the system of linear 
equations. This unfortunate fact eventually increases the data size of the reduced 
matrix; therefore, high-capacity memory is required. To allay the increase in the 
representation size of the elements, we store only a triplet (ei, e 2 , 63) in the PC 
memory, not a large 151-bit integer. Since e l is small enough to be represented 
by 8 bits, the size of the elements is reduced from 151 to 24 bits on average. We 
call this representation the “r-adic structure” . Note that the r-adic structure is 
used for the Galois action and singleton-clique. 

Singleton-Clique and Merging: Filtering consists of two parts, singleton- 
clique and merging. Singleton-clique deletes unnecessary rows and columns to 
reduce the size of the matrix. In our implementation of singleton-clique, we per- 
formed by maintaining 20000 more rows than columns to prevent accidentally 
decreasing the rank of the matrix. After that, merging, a weight-controlled Gaus- 
sian elimination, is performed. In merging, for small integer k, the column with a 
weight not larger than k is deleted by row elimination with controlling the pivot 
selection so that the weight of the matrix is as small as possible. This operation 
is called k - way merging. In our implementation of merging, we converted the 
data representation of the matrix from the r-adic structure to a large 151-bit 
integer structure, since merging on the r-adic structure cannot reduce the size 
of the matrix enough due to the restriction of the pivot selection. More details 
are described in |1‘! . 
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Parallel Lanczos Method: By using the parallel Lanczos method HHii we 
solve the system of linear equations defined by the matrix reduced via the Galois 
action, singleton-clique, and merging. For parallel computing, the matrix should 
be split into sub-matrices, i.e., split into N = x N 2 sub-matrices for N nodes, 
and nodes communicate among Ni nodes or JV3 nodes. To reduce the synchro- 
nization time before communicating, the matrix is split so that each sub-matrix 
has almost the same weight. Our machine environment for the parallel Lanczos 
method consisted of 22 nodes, and each node had 12 CPU cores and 2 NICs. 
The 2 NICs were connected to a 48-port Gbit HUB, i.e., 44 ports were used for 
connecting 22 nodes. All 22 nodes could be used, so we had a choice for machine 
environment; 20 = 5 X 4, 21 = 7 X 3 or 22 = 11 X 2. Using 20 nodes requires the 
least communication costs but the most computational costs, and using 22 nodes 
requires the most communication costs but the least computational costs. Using 
21 nodes was the best for our implementation; therefore, we used 21 nodes. 

For computation in the parallel Lanczos method, many modular multiplica- 
tions of 151-bit integers x 151-bit integers modulo P151 are required due to the 
Galois action. We implemented Montgomery multiplication optimized to 151-bit 
integers using assembly language. Our program then becomes several times faster 
than straightforward modular multiplication using GMP (http : //gmplib . org/) 
for multiple precision arithmetic. 

After the computation of the parallel Lanczos method started, we improved 
our codes of the parallel Lanczos method (for example, efficient register usage, 
overlapping communications and computations). These improvements are about 
15% faster than our initial implementation. 


4.3 Individual Logarithm Phase 

As mentioned in Section 13.11 log 9 rjriQn, Q-k) and log ff Qe) are required 

to solve our target problem. To compute them, rationalization and special-Q 
descent were used. For simplicity, let T be Q-k), or t]t{Qk, Qe) in 

the following paragraphs. 

In the rationalization, we randomize T such that the randomized element is 
M-smooth for a small enough integer M > B by the following process. First, 
we randomize T by z = g 1 T (mod /) for a random integer 7 £ Zp 151 . We 
then rationalize z as z = z\/ Z 2 (mod /) where degrees of z\ and Z 2 are about 
deg//2, and check whether both z\ and Z 2 are M-smooth. Then, computing 
log 9 T is reduced to computing logarithms of irreducible factors of M-smooth 
elements zi and #3. 

M-smooth elements Zi for i = 1,2, contain some irreducible factors of degree 
larger than B whose logarithms are not computed in the linear algebra phase. 
To compute these logarithms, the special-Q descent m is usually used. In the 
special-Q descent, the lattice sieve is recursively conducted with an irreducible 
factor of degree larger than B, which is contained in Zi or in a relation generated 
during the special-Q descent, as a special-Q. 
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5 Experimental Results 

We succeeded in solving a DLP over GF(3 6 ' 97 ) by using the FFS with our efficient 
implementation techniques discussed in Section 0 In this section, we report our 
computational results, such as the computational time of each phase of the FFS 
and the number of relations. 


5.1 Polynomial Selection 

The FFS has six parameters K,dH,d m ,B,R, and S, as defined in Section 12 . 21 
and we set (k, dH,d m , B, R, S ) = (3, 6, 33, 6, 6, 6) for our target problem, based 
on the reason given in Section 13.21 In the polynomial selection phase, we can 
extract appropriate polynomials such as the definition polynomial H(x,y ) of a 
function field described in Section 13.21 in one minute, so the computational cost 
of the polynomial selection phase is negligibly small. 


5.2 Collecting Relations Phase 

In the collecting relations phase, we search many relations that are equations of 
the form (0) to generate a system of linear equations by using the lattice sieve and 
the free relation. We explain our computational results of the collecting relations 
phase, e.g., the number of relations obtained in this phase, the computational 
time of the lattice sieve for one special-Q. 

Lattice Sieve. Each special-Q has to be chosen from F R (6)\F R (5). The num- 
ber of elements of F R (6)\F R ( 5) is 64566684, and the size of the table of those 
elements is about 500 MB. Since our program of the lattice sieve is computed 
using many nodes, it is not convenient to pick up the element from that 500-MB 
table as a special-Q. Therefore, we selected a special-Q by randomly generating 
an irreducible polynomial in GF(3 3 )[x] of degree 6, which is in F R (6)\F R (5), 
and iterated the computation of the lattice sieve for the special-Q. 

We prepared 47 PCs (in total 212 CPU cores) for the lattice sieve. The com- 
putation of the lattice sieve began on May 14, 2011, and we continued optimizing 
our program of the collecting relations phase. As discussed in Section 14.11 we 
applied several improvements to our program of the collecting relations phase; 
the lattice sieve for the JL06-FFS, the lattice sieve with SIMD, and optimization 
for our parameters. Figure Q in Section 14.11 shows the process of our improve- 
ments in the collecting relations phase for the first two weeks. The total time for 
the collecting relations phase shortened due to our improvements. Finally, the 
computation finished on September 9, 2011 and required 118 days, including the 
loss-time of some programming errors, updating our codes, and power outages. 
The real computational time of the lattice sieve was equivalent to 53.1 days using 
212 CPU cores such as Xeon E5440. 

Table 0 summarizes the process of generating relations in the collecting rela- 
tions phase. It might seem that the number of duplicate relations is very small 
compared to the integer factorization case using the number field sieve. This 
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Table 2. Number of collected relations in collecting relations phase 


lattice sieve 

159032292 relations obtained from 2500000 special-Q’s 
(64.91 relations/special-Q, 389 sec/special-Q) 

153815493 unique (non-duplicated) relations 
obtained from 2449991 unique special-Q’s 

free relation 

33786299 relations 

total 

187602242 relations (consist of 134697663 elements in the factor base) 


Table 3. Compressing matrix using Galois action, singleton-clique and merging 


method 

size of matrix 

before compressing 

187602242 equations x 134697663 variables 

Galois action 

159394665 equations x 45049572 variables 

singleton-clique 

14060794 equations x 14040791 variables 

6-way merging 

6141443 equations X 6121440 variables 


arises from the fact that the size of the sieving space in our parameters is so 
large compared to that case. 

Free Relation. The free relation gives us additional relations not generated by 
a sieving alg orithm such as the lattice sieve. The details of the free relation is 
given in j2fl| . As shown in Table |3 the free relation gave us 33786299 relations. 
Eventually, we obtained a system of linear equations consisting of 187602242 
equations and 134697663 variables. Note that there are 451002 elements in the 
factor base, which does not appear in the 187602242 relations. 

5.3 Linear Algebra Phase 

In the linear algebra phase, we firstly reduced the size of the matrix by the 
Galois action and filtering, and then performed the parallel Lanczos method for 
the reduced matrix. Table El shows that the process of the compression of the 
matrix. 

Galois Action. As mentioned in Section 14.21 the Galois action reduced the 
size of the matrix generated in the collecting relations phase to one-third since 
k = 3. To allay the fact that the size of each element of the matrix increases 
from a few bits to 151 bits due to the Galois action, we used the r-adic structure 
mentioned in Section 14.21 

Singleton-Clique and Merging. After using the Galois action, we additionally 
reduce the variables and equations of the matrix by singleton- clique and merging 
Hi . Using a PC, the computation for singleton-clique took about 3 hours, and that 
for merging took about 10 hours. After 6- way merging, we started the computation 
of the parallel Lanczos method for the 6-way merged matrix. See 0 for more 
details about our results of singleton-clique and merging. 
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Table 4. Computational time of parallel Lanczos method for 6-way merged matrix 


calculation time/loop 

626.3 msec 

synchronization time/loop 

46.5 msec 

communication time/loop 

457.3 msec 

total time/loop 

1130.1 msec 

number of loops 

6121438 

total time 

80.1 days 


Parallel Lanczos Method. We used the parallel Lanczos method to 

solve the system of linear equations defined by the 6-way merged matrix. Note 
that this matrix is sparse and defined over Zp 151 , where P151 is the 151-bit 
prime number given in Section 11. II The computation of the parallel Lanczos 
method started on January 16, 2012, and was conducted on 21 PCs, which were 
connected via a 48-port Gbit HUB. As mentioned in Section E3 we continued 
improving our codes of the parallel Lanczos method after computation began. 
The computational times of our improved codes are listed in Table 0 Finally, 
computation finished on April 14, 2012. The computation for the parallel Lanczos 
method took 90 days including time losses similar to our implementation of the 
lattice sieve. The real computational time is equivalent to 80.1 days using 252 
CPU cores such as Xeon X5650. 

5.4 Individual Logarithm Phase 

Our target is to compute log ff rjr ( Q-n , Qe) and log 9 r]T{Q w , Q n ) for some g £ G 2 , 
as mentioned in Section 13.11 

First, we computed the rationalization described in Section 14.31 Let g be a 
polynomial (x+uj)^ 6 97 ~ 1 '>/ p ^ g G 2 , where iv is a polynomial basis of GF( 3 3 ) = 
GF(3)[cu]/(w 3 — bj— 1). Note that g is a generator of G 2 C GF( 3 6 ' 97 )* and x+co is 
a monic irreducible polynomial in F R (B) of degree 1. We set M = 15 and search a 
pair ( zi,Z 2 ) (and ( 4 ,- 4 )) e (G.F(3 3 )[2;]) 2 such that 77 t(Qtt, Qe)-# 71 = Z\/z 2 (and 
VriQn, Qn) ■ <? 72 = 4/4)) where Z{ (and 4) are Mj-smooth (where Mi < M) 
for some 71,72 € Zp 151 and i = 1,2. We found z\ and Z 2 , which are 13- and 
15-smooth (and z\ and 4 which are 15- and 14-smooth), respectively. These 
computations were conducted on 168 CPU cores and required 7 days for each 
computation. 

T]T(Qn, Qe) ■ 5 71 = (13-smooth)/(15-smooth), 

71 = 2514037766787322013334785428291787565870435706, 
Vt(Q-k, Qtt) ■ g 12 = (15-smooth) /(14-smooth), 

72 = 2657516740789758289434702436228062607247517136. 

Next, we performed special-Q descent for each irreducible factor of smooth ele- 
ments obtained by the rationalization. These computations were conducted on 
168 CPU cores and took about 0.5 days for each r)T{Q n , Qe) and ^(Q-m Qtt)- 
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Thus, the computation of the individual logarithm phase took 15 days; (7 days 
(for rationalization) + 0.5 days (for special-Q descent)) X 2 elements. 

By using the logarithms of the corresponding elements in the factor base 
obtained from the linear algebra phase, we could compute ^og g r]T(Q n , Qe) and 
log, ; Qtt)- The logarithm of each element is as follows: 

log g VT{Qn,Qe) = 1540966625957007958347823268423957036469656370, 
log g VT{Qn,Qn) = 1630281950635507295663809171217833096970449894. 

Finally, we obtained the logarithm of the target element: 

* = lo & 7 T(Q„Qa) nr(Qn, C») 

= 1752799584850668137730207306198131424550967300. 

This is the solution of the ECDLP of equation Q n = [s\Q e . 

6 Concluding Remarks 

We evaluated the security of pairing-based cryptosystems using the r/r pairing 
over supersingular elliptic curves on finite field GF( 3"). We focused on the case 
of n = 97 since many implementers have reported practically relevant high-speed 
implementations of the r/r pairing with n = 97 in both software and hardware. In 
particular, we examined the difficulty in solving the discrete logarithm problem 
(DLP) over GF( 3 6 ' 97 ) by our implementation of the function field sieve (FFS). 

To reduce the computational cost of the FFS for solving the DLP, we proposed 
several efficient implementation techniques. In the collecting relations phase, 
we implemented the lattice sieve for the JL06-FFS with SIMD and introduced 
improvements by optimizing for factor bases of each degree; therefore, our lattice 
sieve for the JL06-FFS became about 6 times faster than the first one. The main 
difference from the number field sieves for integer factorization is the linear 
algebra phase, namely, we have to deal with a large modulus of 151-bit prime for 
the computation of the FFS. We thus performed filtering (singleton-clique and 
merging) by carefully considering the data structure of large integers developing 
from the Galois action, so that we can efficiently conduct the parallel Lanczos 
method. From the above improvements, we succeeded in solving the DLP over 
GF( 3 6 ' 97 ) in 148.2 days by using PCs with 252 CPU cores. Our computational 
results contribute to the security estimation of pairing-based cryptosystems using 
the r)T pairing. In particular, they show that the security parameter of such 
pairing-based cryptosystems must be chosen with n > 97. 

Finally, we show a very rough estimation of required computational power for 
solving the DLP over GF(3 6n ) with n > 97. Our experiment on the DLP over 
GF(3 6n ) with n = 97 used 252 CPU cores of mainly 2.67 GHz Xeon for 148.2 
days, which are equivalent to 2 62 ' 9 clock cycles. From the analysis of jjUl], the 
computational complexities of breaking the DLP over GF( 3 6 ") with n = 163 
and 193 become 2 15 4 and 2 191 times larger than that with n = 97, respectively. 
Therefore, we could estimate that about 2 78 3 and 2 82 0 clock cycles are required 
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for breaking the DLP over GF( 3 6 ”) with n = 163 and 193, respectively. On 
the other hand, the currently second fastest supercomputer K has a through- 
put of about 10.5 petaflop/s from http://www.top500.org/, and it performs 
about 2 78 - 1 floating-point operations for one year. If one floating-point opera- 
tion on the CPU of the K is equivalent to one clock cycle of logical operation 
on the Xeon core, we might be able to break the DLP over GF(3 6 ' 163 ) using our 
implementation on supercomputer K for one year. 
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Abstract. Projecting bilinear pairings have frequently been used for de- 
signing cryptosystems since they were first derived from composite order 
bilinear groups. There have been only a few studies on the (im)possibility 
of projecting bilinear pairings. Groth and Sahai showed that projecting 
bilinear pairings can be achieved in the prime-order group setting. They 
constructed both projecting asymmetric bilinear pairings and projecting 
symmetric bilinear pairings, where a bilinear pairing e is symmetric if it 
satisfies e(g,h) = e{h,g) for any group elements g and h; otherwise, it is 
asymmetric. 

In this paper, we provide impossibility results on projecting bilinear 
pairings in a prime-order group setting. More precisely, we specify the 
lower bounds of 

1. the image size of a projecting asymmetric bilinear pairing 

2. the image size of a projecting symmetric bilinear pairing 

3. the computational cost for a projecting asymmetric bilinear pairing 

4. the computational cost for a projecting symmetric bilinear pairing 

in a prime-order group setting naturally induced from the fc-linear as- 
sumption, where the computational cost means the number of generic 
operations. 

Our lower bounds regarding a projecting asymmetric bilinear pairing 
are tight, i.e., it is impossible to construct a more efficient projecting 
asymmetric bilinear pairing than the constructions of Groth-Sahai and 
Freeman. However, our lower bounds regarding a projecting symmetric 
bilinear pairing differ from Groth and Sahai’s results regarding a symmet- 
ric bilinear pairing results; We fill these gaps by constructing projecting 
symmetric bilinear pairings. 

In addition, on the basis of the proposed symmetric bilinear pair- 
ings, we construct more efficient instantiations of cryptosystems that 
essentially use the projecting symmetric bilinear pairings in a modular 
fashion. Example applications include new instantiations of the Boneh- 
Goh-Nissim cryptosystem, the Groth-Sahai non-interactive proof system, 
and Seo-Cheon round optimal blind signatures proven secure under the 
DLIN assumption. These new instantiations are more efficient than the 
previous ones, which are also provably secure under the DLIN assump- 
tion. These applications are of independent interest. 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 61-1791 2012. 
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1 Introduction 

A bilinear group is a tuple of abelian groups with a non-degenerate bilinear 
pairing. Projecting bilinear pairings, which are bilinear pairings with homo- 
morphisms that satisfy a commutative property, have frequently been used for 
designing cryptosystems since they were first derived from composite order bilin- 
ear groups m, though Freeman identified and named the projecting property 
recently [15J- Of special interest is the Groth-Sahai non-interactive proof sys- 
tem m and the Boneh-Goh-Nissim cryptosystem m , both of which essentially 
use the projecting property and have numerous applications in various fields in 
cryptography. For example, the Groth-Sahai proofs were used to construct ring 
signatures [Gj, group signatures [HU, round optimal blind signatures |2ZH , ver- 
ifiable shuffles a universally composable adaptive oblivious transfer pro- 
tocol na, a group encryption scheme E3. anonymous credentials 00, and 
malleable proof systems m- For its part, the Boneh-Goh-Nissim cryptosystem 
was used for designing private searching on streaming data m, non-interactive 
zero-knowledge |2T] , shuffling 0 , and privacy-preserving set operations [321 • 

(Im) possibility of Projecting Bilinear Pairings: Although the projecting 
bilinear pairings are often used for designing various cryptosystems, there have 
been only a few studies on the (im)possibility of projecting bilinear pairings. 
Groth and Sahai m demonstrated that projecting bilinear pairings can be 
achieved in the prime-order group setting. They provided two distinct construc- 
tions in prime-order group setting: projecting asymmetric bilinear pairings and 
projecting symmetric bilinear pairings, where a bilinear pairing e is symmetric if 
it satisfies e(g, ft ) = e(h, g) for any group elements g and ft; otherwise, it is asym- 
metric. On the basis of this idea of projecting bilinear pairings, they developed 
non-interactive proof systems for quadratic equations over modules that can be 
instantiated in composite-order bilinear groups, product groups of prime-order 
bilinear groups with asymmetric bilinear pairings, and product groups of prime- 
order groups with symmetric bilinear pairings. By extending Groth-Sahai’s idea. 
Freeman generalized Groth-Sahai’s projecting asymmetric bilinear pairings |j 
Groth-Sahai and Freeman’s constructions of projecting bilinear pairings allow for 
the simultaneous treatment of subgroup indistinguishability. To use projecting 
bilinear pairings for designing cryptographic protocols, we need to deal with 
cryptographic assumptions such as subgroup decision assumption at the same 
time. Meiklejohn, Shacham, and Freeman have shown some impossibility 
results for projecting bilinear pairings, e.g., that projecting bilinear pairings can- 
not simultaneously have a cancelling property if the subgroup indistinguishabil- 
ity is naturally induced from the fc-linear assumption |2.'il.'i(ij . Recently, Seo and 


1 Freeman identified the other property of bilinear pairings in a composite-order group 
setting, called cancelling, and demonstrated how to achieve the cancelling bilinear 
pairings in the prime-order group setting. 
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Cheon 125 ! proved that bilinear pairings can be simultaneously projecting and 
cancelling when the subgroup decision assumption holds in the generic group 
model0 

Contribution: In this paper, our contribution is a two-fold. First, we aim to 
answer the fundamental question how efficient constructions for projecting bilin- 
ear pairing can be. Second, we propose a construction of projecting symmetric 
bilinear pairings that can achieve the efficiency of our lower bounds and then pro- 
vide several constructions of cryptosystems based on the proposal in a modular 
fashion. 

We focus on constructions only in the prime-order bilinear group setting since 
this type of group usually supports more efficient (group and bilinear pairing) 
operations than those in composite-order bilinear groups (see [El for a detailed 
comparison of composite and prime-order groups). We present several impossi- 
bility results of the projecting bilinear pairings in a prime-order group setting. 
More precisely, we specify the lower bound of 

1. the image size of a projecting asymmetric bilinear pairing 

2. the image size of a projecting symmetric bilinear pairing 

3. the computational cost for a projecting asymmetric bilinear pairing, and 

4. the computational cost for a projecting symmetric bilinear pairing 

in a prime-order group setting naturally induced from the decisional Diffie- 
Hellman (DDH) assumption, the decisional linear (DLIN) assumption, and the 
fc- linear assumption, where the computational cost means the number of generic 
operations. In this paper, we restrict ourselves to a consideration of a framework 
in which the subgroup indistinguishability in the framework relies in a natural 
way on simple assumptions (i.e., the DDH, DLIN, and fc-linear assumption). This 
framework covers all previous constructions by Groth-Sahai and Freeman, and 
this restriction on the framework has already been used in m to show another 
impossibility result on projecting bilinear pairings. As for the computational cost 
of projecting bilinear pairings, we consider a slightly restricted computational 
model since there are typically several ways to perform a given operation, which 
makes it very difficult to compare all possible (even unknown) ways. We have two 
basic assumption in our computational model. First, we only count the number 
of generic operations of the underlying elliptic curve group and the pairings — 
that is, we assume that one cannot utilize information about the representation 
of groups and bilinear pairing operations |37l8l . Second, we assume that two 
inputs of a projecting bilinear pairing are uniformly and independently chosen. 
In special cases, an additional information about two inputs may lead to an effi- 
cient alternative way of computing a pairing operation. For example, when one 
computes e(gi,g2) for the two given inputs g\ and < 72 , where e:GxG-t G t 
is a pairing, if we knows e(g,g), <i\ and <22 such that gi = g ai and <72 = g 0 ' 2 
for a generator g of G, then we can perform one field multiplication and one 

2 Seo and Cheon’s result does not contradict Meiklejohn et al.’s result. Rather, they 
showed that there is a more general class of bilinear groups than Meiklejohn et al. 
considered and that some of theses can be both cancelling and projecting. 
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exponentiation in Gt instead of performing e for e(gi,g 2 ) = e(g. g') aia2 . Since 
we want to consider the computational cost of e in general, that is, without any 
additional information aside from the original two inputs, we assume that two 
inputs are uniformly and independently distributed in their respective domains: 
Hence, our computational model rules out special cases like the above example. 
Although our computational model does not perfectly correspond to the real 
world, we believe that its lower computational bounds can aid our understand- 
ing of the projecting property and enable us to locate efficient constructions for 
projecting bilinear pairings. 

In this study, our lower bounds imply that Freeman’s construction of pro- 
jecting asymmetric bilinear pairings is optimal: that is, it is the most efficient 
construction for projecting asymmetric bilinear pairings in. In contrast, our 
lower bounds for the projecting symmetric bilinear pairing are different from 
those of Groth-Sahai E2- We fill these gaps by constructing projecting sym- 
metric bilinear pairings and demonstrating that our construction can achieve an 
efficiency coincident with the lower bounds. 

The proposed projecting symmetric bilinear pairings can be used to create 
more efficient instantiations of cryptosystems, which essentially use projecting 
property and symmetric bilinear pairings, in a modular fashion. To show that 
the proposed projecting symmetric bilinear pairings can be adapted to various 
cryptosystems, we apply them to three distinct cryptosystems and create new 
efficient instantiations of the Groth-Sahai non-interactive proof system j22| , the 
Boneh-Goh-Nissim cryptosystem m, and Seo-Cheon round optimal blind signa- 
tures m that are provably secure under the DLIN assumption^ The proposed 
instantiation of the non-interactive proof system has a faster verification than 
Groth-Sahai’s instantiation based on the DLIN assumption, and the proposed in- 
stantiation of the Boneh-Goh-Nissim cryptosystem has a smaller ciphertext size 
and a faster decryption algorithm than Freeman’s instantiation based on the 
DLIN assumption. We can also reduce the verification costs of the Seo-Cheon 
round optimal blind signatures. These applications are of independent interest. 
Our new instantiation is based on the DLIN assumption so that we can im- 
prove the efficiency of all subsequent protocols using Groth-Sahai’s instantiation 
3 (based on the DLIN assumption). 

We should note here that symmetric bilinear pairings require the use of super- 
singular elliptic curves and thus the associated bilinear groups are larger than 
those with asymmetric bilinear pairings using ordinary curves (please see ns] 
for a detailed comparison). However, some constructions of pairing-based cryp- 
tosystems essentially use the symmetric property of bilinear pairings (e.g., Groth- 
Ostrovsky-Sahai zero-knowledge proofs |2I])- Therefore, the proposed projecting 
symmetric bilinear pairings can be used for designing such cryptosystems. 


3 The Seo-Cheon round optimal blind signature scheme can be considered a prime 
order group version of the Meiklejohn-Shacham-Freeman round optimal blind sig- 
nature scheme in composite order groups 1251 ■ Since we only consider prime order 
group settings in this paper, we provide a new instantiation of the Seo-Cheon scheme 
instead of the Meiklejohn-Shacham-Freeman scheme. 
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Modular Approach in Cryptography: Generally speaking, a modular ap- 
proach for cryptosystems leads to a simple design but inefficient constructions in 
comparison to an ad hoc approach. Recently, we have found a few exceptions for 
structure preserving cryptography and mathematical structures |26i27| . 

Structure preserving schemes enable one to construct modular protocols while 
preserving conceptual simplicity and yielding reasonable efficiency at the same 
time. Structure-preserving signatures, commitments Q, and encryptions HU re- 
strict all components in schemes to group elements, so schemes can easily be 
combined with Groth-Sahai proofs M- In a modular fashion, round optimal 
blind signatures, group signatures, and anonymous proxy signatures can be de- 
rived from structure preserving signatures, and oblivious trusted third parties 
can be achieved due to the structure preserving encryptions. There has been 
some impossibility results for structure preserving cryptography |2ldl4| . These 
save our efforts in terms of impossible goals and widen our understanding re- 
garding modular constructions. 

Okamoto and Takashima m introduced a mathematical structure called 
“dual pairing vector spaces” that can be instantiated using a product of bi- 
linear groups or a Jacobian variety of a supersingular curve of genus > 1. 
On the basis of these dual pairing vector spaces, a homomorphic encryption 
scheme j22|, functional encryption scheme |27I2XI.'1()| . attribute-based signature 
scheme m , and (hierarchical) identity-based encryption scheme j22j have been 
proposed. 

Open Problem: It would be interesting to extend the (im) possibility of the 
projecting property into a wider framework than ours. Furthermore, finding 
other applications of projecting pairings is also interesting. 

Road Map: In Section |3 we give definitions for bilinear groups, projecting 
property, and cryptographic assumptions. In Sectional we explain our impossi- 
bility results of projecting bilinear pairings. In Section 01 we show the optimality 
of Groth-Sahai and Freeman’s projecting asymmetric bilinear pairings and give 
our construction for optimal projecting symmetric bilinear pairings. In Section El 
we apply the proposed projecting symmetric bilinear pairings to three distinct 
cryptosystems, the Groth-Sahai non-interactive proof system, the Boneh-Goh- 
Nissim cryptosystem, and the Seo-Cheon round optimal blind signatures. 

2 Definition 

We use notation x A A to mean that, if A is a finite group G, an element x is 
uniformly chosen from G, and, if A is an algorithm, A outputs x by using its own 
random coins. We use [i,j] to denote a set of integers {«,..., j}, (gi , ... , g n ) to 
denote a group generated by g ±, . . . , g n , and F p to denote a finite field of prime 
order p. For a map t :T D T R . and any subset So of T D , t(Sd ) := {t(.s)|s e 
Sx>}. All values in our paper are outputs of some functions taking the security 
parameter A and ~ denotes the difference between both sides is a negligible 
function in A. 
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We use two commonly used mathematical notations internal direct sum , de- 
noted by and tensor product (Kronecker product), denoted by <g). For an 
abelian group G, if G\ and G2 are subgroups of G such that G = G\ + G2 = 
{91 ' € Gi,g2 € G2} and G\ fl G2 = { 1 g} for the identity 1 <3 of G, then 

we write G = Gi ® C?2- If A = (ojj) is a mi x m2 matrix and B = (bj.j ) is 
an l\ x matrix, the tensor product A (g) B is the m\t\ x 7712^2 matrix whose 
(i, j)-th block is atjB, where we consider A <S> B as mi x m2 blocks. That is, 

f aip-B . . . ai tm2 B 1 


A.® B — I • ‘* # * I ^ 7712^2 


a mi ,iB...a mi , m2 B 


We use several properties of the internal direct sum and tensor product. Every 
element g in G has a unique representation if G = G\ © G-2- That is, g £ G can 
be uniquely written as g = g\g2 for some g-\ £ G 1 and 52 £ G2. If two matrices 
A and B are invertible, then A 0 B is also invertible and the inverse is given by 
[A <S> B)- 1 = A -1 (g) B -1 . The transposition operation is distributive over the 
tensor product. That is, ( A <g) B f = A 1 ® B f . We sometimes consider a vector 
over F p as a matrix with one row. 


2.1 Bilinear Groups and Projecting Bilinear Pairings 

Definition 1. Let Q be an algorithm that takes as input the security parameter 
A. We say that Q is a bilinear group generator if Q outputs a description of five 
finite abelian groups (G,Gi,H,Hi, and G t ) and a map e such that G 1 C G, 
Hi c H, and e : G x H — >• G t is a non-degenerate bilinear pairing; that is, it 
satisfies 

• Bilinearity: e{gig2,hih 2 ) = e(g 1 ,h 1 )e(g 1 ,h 2 )e(g2,h 1 )e(g2,h 2 ) for g u g 2 e 
G and h\,h2 € H, 

• Non- degeneracy: for g € G, if e{g, h) = 1 V/i £ H, then g = 1. Similarly, for 
h £ H, if e(g, h) = 1 Mg £ G, then h = 1. 

In addition, we assume that group operations in each group (G, H, and 
G t ), bilinear pairing computations, random samplings from each group, and 
membership-check in each group are efficiently computable (i.e., polynomial time 
in \). 

If the order of output groups of Q is prime p, we call Q a bilinear group 
generator of prime order and say Q\ — >• (p, G,H,G t ,e); that is, <G, H and G t are 
finite abelian groups of prime order p. 

If G = H, G\ = Hi, and e(g,h) = e{h,g) for all g,h £ G, we say that Q is 
symmetric. Otherwise, we say that Q is asymmetric. 

We define the projecting property of a bilinear pairings. 

Definition 2. Let Q be a bilinear group generator, and Q -V 
(G,Gi, H, Hi,G t ,e). We say that Q is projecting if there exist a subgroup 
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G' t C Gt and three homomorphisms n : G G, n : H — >• H, and n t : Gt — > Gt 
such that 

1. 7 r(G) 7 ^ {1g}> 7 t (H) ^ {1 H }, and n t (e(G,H)) ^ {It}, where 1 a, 1 h, and l t 
are identities of G, H, Gt, respectively. 

2. Gi C ker(7r), Hi C ker(7r), and G' t C ker(7q). 

3. •Kt (e(g, h )) = e( 7 r (g), i i(h)) for all g e G and h £ H. 

If Q is symmetric, set n = w. 

Note that in Definition El we slightly revised Freeman’s original projecting defi- 
nition to fit our purpose. First, we added a requirement for homomorphisms to 
be non-trivial (first condition of Definition E|) . If we allowed trivial homomor- 
phisms, they would satisfy the projecting property. Since trivial homomorphisms 
may not be helpful in designing cryptographic protocols, our modification is quite 
reasonable. Second, our definition requires only the existence of G' t and homo- 
morphisms while Freeman required them to be output Since our definition 
is weaker than Freeman’s (if we ignore our first modification), our main results 
(the lower bounds and optimal construction) are meaningful. Several other re- 
searchers |25l24j have used an existence definition like ours instead of Freeman’s 
definition for the projecting property. 

2.2 Subgroup Decision Assumption and fc-Linear Assumption 

Flere we define subgroup decision problem and subgroup decision assumption in 
the bilinear group setting, which were introduced by Freeman lEI- 
Definition 3. Let Q be a bilinear group generator. We define the advantage of 
an algorithm A in solving the subgroup decision problem on the left, denoted by 
Adv s A D g PL { A), as 

| Pr [A{G, G U H , Hi, G t , e, g) 1| [G, G X ,H, H u G t , e) A Q(X),g A G] 

- Pr [A(G, Gi, H, Hi,G t , e, gi) -4 1| (G, G u H, Hi, G t , e) A Q{\),gi 4- Gi] |- 

We say that Q satisfies the subgroup decision assumption on the left if, for any 
PPT algorithm A, its Adv^. PL (A) is a negligible function of the security param- 
eter A. 

We analogously define the subgroup decision problem on the right, the advantage 
Adv^J >R of A, and the subgroup decision assumption on the right by using H 
and Hi instead of G and G i. 

Definition 4. We say that a bilinear group generator Q satisfies the subgroup 
decision assumption if Q satisfies both the subgroup decision assumptions on the 
left and subgroup decision assumptions on the right. 

For a subgroup decision assumption in the prime-order group setting, we use the 
widely-known k-linear assumption which is introduced by Hofheinz and Kiltz 
and Shacham D.'il.'Kil , in the bilinear group setting. We give the formal definition 
of fc-linear assumption below. 
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Definition 5. Let Q\ be a bilinear group generator of prime order and k > 1. 
We define the advantage of an algorithm A in solving the k-linear problem in G, 
denoted by Adv^g™ G (X) , to be 

| Pr [„4(G, H, G t , e, g, m, u“ s , 0 b , f) for i 6 [1, *]) -»• 1| 

(G,H,G t ,e) 4-ei(A),fl,Ui AG,f) 4-F p for i € [1, k], b 4- F„] 

- Pr [«4(G, H, G t , e, 0 , up u“ 4 , 0 b , f) for i € [1, A;]) -► 1| 

(G,H,G*, e) 4- ft (A), ft Hi 4- G, t) A H, a t A F p for i 6 [1, fc], 6 = Ei 6 [i, fe] «<] |- 

T/ien, we say that Gi satisfies the k-linear assumption in G if for any PPT 
algorithm A, Adv^fg™ G (X) is a negligible function of the security parameter. 

We can analogously define the k-linear assumption in H. The 1-linear assumption 
in G is the DDH assumption in G and the 2-linear assumption in G is the 
decisional linear assumption in G |9 . 

3 Impossibility Results of Projecting Bilinear Pairings 

In this section, we first formally define natural product groups of prime-order 
bilinear groups. Next, we derive conditions for projecting bilinear groups, and 
then provide our impossibility results of projecting bilinear pairings. We begin 
by defining some notations that will help us to simplify explanations. For group 
elements 0 , gi, . . . , flfe+i G G, a vector ct — (ai, . . . , afc+i) G F^ +1 , and a matrix 
M = (mij) G Maf(fc + i) X (fc + i)(Fp), we use the notation 

fl* := (fl ai , ■ ■ • ,0 afe+1 ) e G fe+1 

and 

:= ( n n uT^). 

From this notation, we can easily obtain 

3.1 Bilinear Groups Naturally Induced from fc- linear Assumption 

In Figure d we provide a generator for An G Maf( fc+1 ) X ( fe+1 )(F p ) 

and t G [l,m]. When we refer to the natural construction of product groups of 
prime-order bilinear groups such that the subgroup decision assumption “nat- 
urally” follows from the fc-linear assumption, we mean ^ When we 

4 Meiklejohn et al. [251 also used the word “natural” to refer to ^ey 

used g^ A Aesii,m) s j low the limitation result of both projecting and cancelling: 
They showed that for any At matrices used in q\. Aeeii.m]^ gi canno j pg 

both projecting and cancelling with overwhelming probability, where the probability 
goes over the randomness used in g^ A Ateii,m] 
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1. takes the security parameter A as input. 

2. Run Si (A) — > (p, G,H, Gt,e). 

3. Define G = G k+1 ,H = H fc+1 , and G t = Gf- 

4. Randomly choose ifri, . . . ■ ■ ■ ,l?k €E F p +1 such that the set {"a^i}»e[i,ib] 

and {"^i}ie[i,fc] are each linearly independent. 

5. Randomly choosegenerators 0 6 G and f) £ H, and let Gi = (g^ 1 , . . . , ) and 

Hi = ( 

6. Define a map e : G X H -> Gt as an m-tuple of maps e(-, -)e for L £ [1 , m] as 
follows: 

e(( 0 i,..., 0 fc+ i), (f)i,..., f)fc+i))*» n e( 0 i,^) a S, 

where A e = (a$) 6 Mot (fc+1)x(fc+1) (F p ) for t 6 [l,m]. 

7. Output description of ( p , G, G\. H. Hi, Gt. e); each group description has its gen- 
erators only, (e.g., Gi's description has g * 1 , . . . , , but "af» is not contained in 

the description of Gi.) 


Fig. 1. Description of Q 


consider the subgroup decision assumption, which is induced from the fc-linear 
assumption, to mean that, given g, it is hard to determine if g «— Gi or g •£- G, 
G is a rank- (A; + 1) F p -module, and Gi is a randomly chosen rank-fc submodule 
of G. For any matrices A\, . . . ,A m in Mat(k+i)x(k+i)ffip)> a group generator 
g{ A e}etii,m] g^jgfjgg yj le subgroup decision assumption if the underlying prime- 
order bilinear group generator Q\ satisfies the fc-linear assumption. 

Theorem 1. TUX Theorem 2.5] If Q\ satisfies the k-linear assumption in G and 
H, satisfies the subgroup decision assumption regardless the choice 

°f 

Note that contains Groth-Sahai’s constructions based on the DDH 

assumption (k = 1) and the DLIN assumption (k = 2). 

3.2 Conditions for Symmetric Property 

A bilinear pairing e of in Figure Q can be rewritten, using matrix 

notation, as 

where is considered to be a 1 x (k + 1) matrix, and ~if t is considered to be a 
(k + 1) x 1 matrix. 

If Q\ is a symmetric bilinear group generator of prime-order, then one may 
think that is also a symmetric bilinear group generator. However, 
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not all bilinear groups with underlying symmetric bilinear pairings e do satisfy 
symmetric property. The following theorem shows the necessary and sufficient 
condition of {At\te[\,m} for to be symmetric, that is, e(g, h) = e(h, g ) 

for any group elements g and h. 

Theorem 2. i s symmetric if and only if G = H, g = f), iti = "j/j 

for all i e [1, fe], and At is symmetric for all l € [l,m], where and 

"j/i are defined in the description 0 f Q^ Ae ^ tell,rn] . 

Because of space constraints, we give the proof of Theorem 0 in the full version 
of this paper. 


3.3 Necessary Condition for Projection Property 

Using a tensor product 0, we can further simplify e computation as follows: Let 
B be a (k + l) 2 x m matrix such that B' s ((i — l)(fe + 1) + j,£) entry is a®, 
where At = ( a^j ). Then, 

tfi) = (e(g*, ^)i, • • • , e{g^, 

= (e(fl, ■ • • , e(0, tf**™?*) = e( fl , 

From now, we use a notation Q ^ as well as gj A, ' lte ^’ rnl ‘ to denote a bilinear group 
generator naturally induced from the fc-linear assumption, where B is defined 
by {At\te[l,m] as above. This notation is well-defined since there are one-to-one 
correspondence between B and {At}te[ i,m]- 

We give a necessary condition of B for Qjf to be projecting in Lemma QJ This 
lemma says that if G = Gi ® and H = Hi 8 H%, then e(C? 2 , ify) should have 
at least an element not contained in the subgroup generated by other parts of 
images. 

Lemma 1. 1. IfGk asymmetric (that is, -4- (p,G,Gi,H,Hi,G t ,e)) and 
projecting, for decompositions G = Gj 8 G -2 and H = H t CD ffy it satisfies 
that e(C? 2 , H 2 ) <£. B, where B is the smallest group containing e{G\,H) and 
e{G,H 1 ). 

2. IfGk symmetric (that is, Gk (p,G,Gi,G t ,e)) and projecting, for any 
decomposition G = G\ ® G 2 it satisfies that e(G 2 ,C? 2 ) <jt B, where B is the 
smallest group containing e(G, Gfi). 

Proof. (1) Suppose that G(( is projecting. Then, there exist three homomor- 
phisms 7 r, 7 r, and 7Tt. Since 7r and 7t are non-trivial homomorphisms, G 1 and Hi 
are proper subgroups of G and H . respectively. Since G 1 and Hi are proper sub- 
groups, for any decompositions G = Gi ©G 2 and H = Hi 8 ffy, {1g} ^(?2 C G 
and {Iff} 7 ^ H -2 C H. We show that G \ , G 2 , H\ , and H? satisfy the condition 
in the theorem. By definition of B, B is a group generated by all elements in 
e(Gi, H) and e(G, Hi) so that every element in B can be written as a product of 
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elements in e(Gi, H) and e(G, Hi) (though it is not uniquely written). For any 
9i € Gi, hi £ Hi, g £ G, and h £ H, Tr t (e(gi, h)e(g, hi)) is equal to l t since 

nt(e(gi,h))ir t (e(g,hi)) = e(Tr{gi),Tr{h))e(ir(g),Tt(hi)) = e{l G ,Tt(h))e(n{g), 1 H )- 

We can see that by homomorphic property of nt, 7Tt(B) = l t . If e(G 2 , H 2 ) C B, 
then e(G, H) C B C ker(?rt). That is a contradiction of 7r t ’s non-trivial condition. 

(2) We can prove similarly as (1). Essential proof idea is same to (1). Thus, 
we omit it. □ 


For our impossibility results regarding the image size and computational cost, 
we will focus on the ( k + l) 2 X m matrix B of Gjf . All non-zero entries in B 
imply e-computations (bilinear pairing e of underlying bilinear group generator 
Gi) and the lower bound of rn implies the lower bound of the image size of 
bilinear pairings. We compute the lower bound of the rank of B of where 
Gt? is asymmetric and projecting, by using the necessary condition of projecting 
property in Lemma Q For projecting symmetric bilinear pairings, the overall 
strategy is similar to those of projecting asymmetric bilinear pairings except 
that symmetric bilinear pairings have the special form of B as mentioned in 
Theorem |3 We give the formal statement below. 

Lemma 2. The following statements about G jf are true with overwhelming prob- 
ability, where the probability goes over the randomness used in the G k ■ 

1. IfGk is asymmetric and projecting, then B has (k + 1) 2 linearly independent 

2. If Gk is symmetric and projecting, then B has ( fc+1 F fc + 2 ) linearly indepen- 
dent rows. 

Proof. (1) Let Gk be a projecting asymmetric bilinear group generator. Let 
(G,Gi,H,Hi, G;t, e) be the output of Gj? and G and H be decomposed by 
G = Gi ® G 2 and H = Hi ® H 2 , respectively for some subgroups G 2 
and H 2 . Then, Gi = Hi = <f)^, . . . , f)^>, G 2 = 

and H ‘2 = (f)^ fc+1 ) for some sets of linearly independent vectors {^i}ie[i,k+i] 
and {"^i}ie[i,fc+i]- Let X be a (k + 1) X (k + 1) matrix over F p with "a^ as its 
i-th row, and Y be a (k + 1) X (k + 1) matrix over F p with ; as its i- th row. 
Note that X and Y are invertible. Since B is a (k + 1) 2 X m matrix for some m, 
B can have at most ( k + l) 2 linear independent rows. 

Suppose that B has less than (k + l) 2 linearly independent rows. We observe 
that 

e(G 2 ,H 2 ) = (e(g^ k+1 , ^ )) = (e( fl , »+0*) = (g( fl , 


and similarly 

B = <e( 0 , . . . , 

where ~ti is the i-th canonical vector of F p fc+1 . Now, we show that there exists 
a non-zero vector 6 F p fc+1 ^ with a non-zero in the (k + l) 2 -th entry such that 
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■ (X ® Y)B = 7? £ F” 1 . The existence of such a vector if implies that the 
(k + l) 2 -th row of ( X ® Y)B can be represented by the linear combination of 
upper rows of (X®Y)B so that e(G- 2 , Hf) C B. Then, it would be a contradiction 
with Lemma Q 

By hypothesis (rank(B) < (k + l) 2 ), there exists a non-zero vector ~f £ 
Fp fc+1 ^ such that "rfB = if £ F™. For such an “f, we show that (X -1 ® Y _1 ) 
satisfies conditions for it to be if aforementioned. First, we obtain ~f(X _1 ® 
y- 1 ) • (X ® Y)B = ~^B = if. Next, we argue that ~r t (X -1 ® y _1 )’s (k + l) 2 -th 
entry is non-zero with overwhelming probability, where the probability goes over 
the randomness used in Q £ (to choose if i , if fe, if i ..... l/^). We consider 
the (k + l)-th column vector x* of X -1 such that x is orthogonal to all upper k 
rows of X. Denote the orthogonal complement of (ifi, . . . , iffc) by (if). Then, 
x l is a non-zero vector in (if). By definition of ifi, . . . , Xk are randomly 
chosen so that if is also uniformly distributed in I* +1 . Similarly, the (k + l)-th 
column vector if of Y~ x is a non-zero vector in (l/l, . . . , 1 1 k) ± '■= (if), and 
if is uniformly distributed in F^ +1 . The {k + l) 2 -th entry of ~f(X _1 ® X -1 ) 
is "ffaf 0 y*), and it is a non-zero constant multiple of "ffrf 0 if)*. By the 
first statement of Lemma 01 which is given below, ~f (if 0 if)* is non-zero with 
overwhelming probability. Therefore, we complete the proof of the first statement 
of theorem. 

(2) We can prove the second statement of theorem by using the second statements 
of Lemma Q] and Lemma El The overall strategy is same to the proof of the first 
statement of theorem. The key observation of the proof of the second statement 
is that B has a special form due to Theorem El We leave the detail of the proof 
of the second statement in the full version. □ 

Lemma 3. Let V be a subspace of Fp fc+1 ^ generated by , where 

is a vector with 1 in the (i-l)(k+l)+j-th entry, — 1 in the (j—l)(k+l)+i-th 
entry, and zeros elsewhere. 

1. For any non-zero vector ~f £ Fjf +1 ^ , Prff • (if 0 if )* 
probability goes over the choice of vectors if , if £ Fp +1 

2. For any vector £ Fjf +1 ^ \ V, Prff ■ (if ® if) 4 = 
probability goes over the choice of a vector if £ Fp +1 . 

We can prove Lemma El by using the Schwartz-Zippel lemma m and leave a 
detailed proof in the full version. 


= 0] < where the 
■ 0] < where the 


3.4 Impossibility of Projecting Property 

Basing on Lemma El we derive our main theorem on the impossibility results of 
projecting bilinear pairings. We begin with explaining our computational model 
for the lower bounds of computational cost of projecting bilinear pairings. In 
our computational model, we assume two things: First, one who computes pro- 
jecting bilinear pairings e can not utilize the representation of the underlying 
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bilinear pairing e and groups G,H, and G t over which e is defined. Note that 
we rule out techniques for multi-pairings [3411 7j in our computational model. 
This assumption is same to that of the generic group model BZI, in particular, 
generic bilinear group 0. In |37l8j . the generic (bilinear) group model is used 
to show the computational lower bounds of attacker solving number theoretic 
problems such as the discrete logarithm problem and (/-strong Diffie-Hellman 
problem. Second, two inputs are uniformly and independently chosen so that 
any relations with two inputs are unknown. In special cases such that a relation 
with two inputs are known, there are several alternative way to compute bilinear 
pairings. For example, one knowing gi, hi, e(g,h), and a relation g\ = g 2 and 
hi = h? can compute e(gi,hi) by performing e(g,h) e instead of performing a 
bilinear pairing. Since we want to consider the computational cost of e without 
using any additional information of two inputs, we assume that two inputs are 
uniformly and independently distributed in their respective domains. We provide 
our main theorem below. 

Theorem 3. (Lower Bounds) The following statements about Gj) are true with 
overwhelming probability, where the probability goes over the randomness used 
in the . 

1. The image size of a projecting asymmetric bilinear pairing is at least (k+ 1) 2 
elements in Gt . 

2. The image size of a projecting symmetric bilinear pairing is at least 

(fc+iKfc+2) e i emen t s Q t _ 

3. Any construction for a projecting (asymmetric or symmetric) bilinear pairing 
should perform at least ( k + l) 2 computations of e in our computational 
model. 

Proof. (1) Suppose that Gj) is asymmetric and projecting. Since a (k + l) 2 x to 
matrix B has at least ( k + l) 2 linearly independent rows by Lemma El m > 
(k + l) 2 . This implies that Gt = G™ consists of m (> (k + l) 2 ) elements in G t . 

(2) If Gk is symmetric and projecting, then (k + 1) 2 X rn matrix B has at least 
(fc+i)(fc+ 2 ) jj near independent rows by Lemma El Thus, m > ( fc+1 K fc + 2 ) • hence, 
an element in G t = G™ is m {> , ( fc + 1 K fc + 2 ) ) elements in G t . 

(3) First, we show that for two inputs g = (fli, . . . , flfc+i) G G and h = 

(f)i, . . . , (jfc+i) G H , projecting (asymmetric or symmetric) pairings require com- 
puting all e(flj, f )j) for all i,j G [1, fc + 1]. To this end, it is sufficient to show that 
every row in the matrix B is non-zero. (Recall that e(fl^, f]^) = e(g, fj)(“i®T)/3 
and if every row in B is non-zero, then e(Q Wi , f ] Zj ) should be computed at least 
one time.) If a group generator Gk is projecting and asymmetric, then the rank 
of B is (fc + 1) 2 by Lemina0 Since B has (k + 1) 2 rows, there is no zero rows. If a 
group generator G jf is projecting and symmetric, then the rank of B is ( fc+1 K fc + 2 ) 
by Lemma d We know that the matrix B of symmetric bilinear group generators 
has the special form by Theorem El From Theorem El some rows in B 

have respective same rows in B. Since B has (k + 1) 2 rows and (k + 1) 2 — k ( k + 1 '> 
is equal to the rank of B, every row in B has at least one non-zero entry. 
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Next, we show that computing e(gi,f)j) cannot be generally substitute by a 
product of other e(gj >, fjj/) for i' e [1, k + 1] \ {i} and f £ [1, k + 1] \ {j} in our 
computational model. To this end, it is sufficient to show that for any non-zero 
vector = (n, . . . , r (fe+1) 2 ) e F^ fc+1) , 

$ Pr $ [ II lb') rc<_1)(fc+1 >' M = lc t ] « 0. 

For two random inputs g”’ and , 

n e(fT, = e(g, 

where = (ioi, . . . , Wfe+i) £ Fp +1 and = (zi , . . . , Zk+ 1 ) £ F^ +1 . Since ~r rt is 
a non-zero vector in Fp fc+1 ^ , (wl ® T^)~r^ ^ 0 with overwhelming probability by 
Lemma El and hence we obtain the desired result such that 

e(g Wi , t) z o) r ^-mk+D+j ^ i Gt 

hie[i,fe+i] 

with overwhelming probability. 

Therefore, all projecting bilinear pairings require at least ( k + l) 2 
e-computations. □ 

4 Optimal Projecting Bilinear Pairings 

In this section, we show that our lower bounds are tight; for projecting asym- 
metric bilinear pairing, we show that Groth-Sahai and Freeman's constructions 
are optimal (in our computational model), and for projecting symmetric bilin- 
ear pairing, we propose a new construction achieving optimal efficiency (in our 
computational model). 

Definition 6. Let Q ^ be a projecting asymmetric (symmetric, resp.) bilinear 
group generator. If the bilinear pairing e consists of ( k + l) 2 e-computation in 

(fc+1) 2 (fc + 1 )(fc + 2 ) 

our computational model and Gt = (Gt = G t 2 , resp.), we say 

that Gk ls optimal. 

We can define Q J? by defining a (k + l) 2 x m matrix B, or equivalently a set 
of (k + 1) X (k + 1) matrices {Af:}ee[i.rn]- For a projecting asymmetric bilinear 
group generator, we define B as I^+ 1 ) 2 , where I(k+i) 3 is the identity matrix 
in GL(fc +1 )a(F p ). Note that £? fc <fe+1)2 is exactly equal to Freeman’s projecting 
asymmetric bilinear group generator [E! (We can easily check that G k k+1) does 
not satisfy the symmetric property due to Theorem 0 . Theorem E3 implies that 
G k <k+1) is optimal. Therefore, we obtain the following theorem. 
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Theorem 4. G k k+r>2 is an optimal projecting asymmetric bilinear group gener- 
ator. 

G k (k+1)2 covers one of the most interesting cases k = 1: is optimal 0 


4.1 Optimal Projecting Symmetric Bilinear Pairings 

We propose an optimal projecting symmetric bilinear group generator G k by 
defining B (equivalently Ai , . . . , A m ). Let a set S be G [1, k+ ij '% [1, k + 

1] 1 1 < j < i < k + 1}. We consider a map r : S -y [1, ( fc+1 H fc + 2 ) ] defined by 
(i,j) i— >• ifcd) 

Lemma 4. r is a bijective map. 

We give the proof of Lemma 0| in the full version. 


Description of An (equivalently B) for optimal projecting symmetric 
bilinear pairings: Let r~ l {£) = (i,j). For each £ G [1, ( fc + 1 K fc + 2 ) ], = ( a W) 

is defined as a (k + 1) X (k + 1) matrix with 

{ 1 in the entry (i, j) and zeros elsewhere if i = j. 

1 in the entries (i,j) and (j, i), and zeros elsewhere otherwise . 


We give an example to easily explain the proposal. 


Example 1. For k = 2, define 







□ 

Define B as a (k + l) 2 X ( fc+1 K fc + 2 ) matrix such that B’s (( s — l)n + 1,£) entry 
is for s,t e [l,k + 1] and £ G [1, ( fc + 1 K fc + 2 ) ] , (Then, we implicitly define 

Gt = G t 2 .) By using the matrix B, we can construct a bilinear group 

generator G k . 

Next, we show that a group generator G k , where B is defined as above, is an 
optimal projecting symmetric bilinear group generator. The following Theorem E] 
provides the desired result. 


Freeman used the notation Qp, which is equivalent to our notation Cq 4 . 
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Theorem 5. Let Gjf be a bilinear group generator with restrictions such that 
G = H, 0 = f>, 'ati = for all i G [1, k], and B is a {k + l ) 2 x ( fc + 1 K fc + 2 ) 
matrix defined as above. Then, Q® is an optimal projecting symmetric bilinear 
group generator with overwhelming probability, where the probability goes over 
the randomness used in Gjf ■ 

We leave the proof of Theorem El in the full version. 

Our definition of projecting requires only the existence of homomorphisms 
satisfying some conditions. However, some applications (ex: Boneh-Goh-Nissim 
cryptosystem jlOllftj l require that such homomorphisms are efficiently com- 
putable. We provide the way how to construct efficiently computable homo- 
morphisms (precisely, natural projections) satisfying projecting property in the 
full version. 

Example 2. For k = 2, we can construct an optimal projecting symmetric bi- 
linear group generator by using the matrices in example 1. We denote such a 
bilinear group generator by , where B* is a 9 x 6 matrix defined by the 
Ai, ... ,Aq matrices in example 1. 


□ » ” 1 
o |jJ o c 
0 0 0 [” 
o 0 0 ( 

0 0 0 ( 

0 0 0 ( 
o o o \T\ 


^0 0 
0 0 
00 

0 0 . 


By Theorem El Gff is optimal projecting symmetric: Since B* is a 9 X 6 matrix, 
the target group Gt is equal to Gf . Moreover, B* has nine l’s in the entries and 
zeros elsewhere so that bilinear pairing e requires 9 e-computations (without any 
exponentiations) . 


5 Application 

On the basis of our optimal projecting symmetric bilinear pairings, we derive 
new instantiations of three distinct cryptosystems with improved efficiency. In 
particular, we apply the projecting symmetric bilinear group generator Off in 
the example |2I for the Groth-Sahai non-interactive proof system, the Boneh- 
Goh-Nissim Cryptosystem, and the Seo-Cheon round optimal Blind signature 
scheme. Because of space constraints, we leave details in the full version. 
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Abstract. In the last years the use of large matrices and their alge- 
braic properties proved to be useful to instantiate new cryptographic 
primitives like Lossy Trapdoor Functions and encryption schemes with 
improved security, like Key Dependent Message resilience. In these con- 
structions the rank of a matrix is assumed to be hard to guess when 
the matrix is hidden by elementwise exponentiation. This problem, that 
we call here the Rank Problem, is known to be related to the Decisional 
Diflie-Hellman problem, but in the known reductions between both prob- 
lems there appears a loss-factor in the advantage which grows linearly 
with the rank of the matrix. 

In this paper, we give a new and better reduction between the Rank 
problem and the Decisional Diflie-Hellman problem, such that the reduc- 
tion loss-factor depends logarithmically in the rank. This new reduction 
can be applied to a number of cryptographic constructions, improving 
their efficiency. The main idea in the reduction is to build from a DDH 
tuple a matrix which rank shifts from r to 2r, and then apply a hybrid ar- 
gument to deal with the general case. In particular this technique widens 
the range of possible values of the ranks that are tightly related to DDH. 

On the other hand, the new reduction is optimal as we show the 
nonexistence of more efficient reductions in a wide class containing all 
the “natural” ones (i.e., black-box and algebraic). The result is twofold: 
there is no (natural) way to build a matrix which rank shifts from r to 
2r + a for a > 0, and no hybrid argument can improve the logarithmic 
loss-factor obtained in the new reduction. 

The techniques used in the paper extend naturally to other “algebraic” 
problems like the Decisional Linear or the Decisional 3-Party Diflie- 
Hellman problems, also obtaining reductions of logarithmic complexity. 

Keywords: Rank Problem, Decisional Diflie-Hellman Problem, Black- 
Box Reductions, Algebraic Reductions, Decision Linear Problem. 

1 Introduction 

Motivation. In the last years the use of large matrices and their algebraic 
properties proved to be useful to instantiate new cryptographic primitives like 
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Lossy Trapdoor Functions [71811 211 .'ll and encryption schemes with improved 
security, like Key Dependent Message |2j. In these constructions the rank of a 
matrix is assumed to be hard to guess when the matrix is hidden by elementwise 
exponentiation. This problem, that we call here the Rank Problem, is known to 
be related to the Decisional Diffie-Hellman (DDH) problem, but in the known 
reductions between both problems there appears a loss-factor in the adversary’s 
advantage which grows linearly with the rank of the matrix. The Rank Problem 
first appeared in some papers under the names Matrix-DDH |2j and Matrix 
d-Linear [TUj . 

In the cryptographic constructions mentioned above, some secret values (mes- 
sages or keys) are encoded as group element vectors and then hidden by multi- 
plying them by an invertible matrix. The secret value is recovered by inverting 
the operations: first multiplying by the inverse matrix and then inverting the 
encoding as group elements. This last step requires to encode a few bits (typi- 
cally, a single bit) in each group element, forcing the length of the vector and the 
rank of the matrix to be comparable to the binary length of the secret. Security 
of these schemes is related to the indistinguishability of full-rank matrices from 
low-rank (e.g., rank 1) matrices: If the invertible matrix is replaced by a low rank 
one, the secret value is information-theoretically hidden. Therefore, the security 
of these schemes is related to the hardness of the Rank problem for matrices of 
large rank (e.g., 320 or 1024). 

Reductions of the DDH problem to the Rank problem are based in the obvious 
relationship between them in the case of 2 x 2 matrices. Namely, from a DDH 
problem tuple {g,g x ,g y ,g z ) one can build a matrix g M = (^ y , which is the 

elementwise exponentiation of the matrix M = ^ . For a 0-instance of 

DDH (i.e., ^ = xy), det M = 0, while for a 1-instance (i.e., z xy), det M ^ 
0, and therefore, the rank of M shifts from 1 to 2 depending on the DDH 
instance. This technique can be applied to larger (even non-square) matrices by 
just padding the previous 2x2 block with some ones in the diagonal and zeroes 
elsewhere, just increasing the rank from 1 or 2 to r + 1 or r + 2, where r is the 
number of ones added to the diagonal. 

Now, a general reduction of DDH to any instance of the rank problem (i.e., 
telling apart hidden matrices of ranks r\ and r^) is obtained by applying a hybrid 
argument, incurring into a loss-factor in the adversary’s advantage which grows 
linearly in the rank difference r 2 — r%. 

This loss-factor has an extra impact on the efficiency of the cryptographic 
schemes based on matrices: For the same security level the size of the group has 
to be increased, and therefore the sizes of public keys, ciphertexts, etc. increase 
accordingly. 

Until now it was an open problem to find a tighter reduction of DDH to the 
Rank problem. To face this kind of problems one can choose between build- 
ing new tighter reductions or showing impossibility results. However, most of 
the known impossibility results are quite limited because they only claim the 
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nonexistence of reductions of certain type (e.g., black-box, algebraic, etc.). But 
still these negative results have some value since they capture all possible ‘natu- 
ral’ reductions between computational problems, at least in the generic case (e.g., 
without using specific properties of certain groups and their representation). 

Main Results. In this paper, we give a new and better reduction between 
the Rank and the DDH problems, such that the reduction loss-factor grows 
logarithmically with the rank of the matrices. This new reduction can be applied 
to a number of cryptographic constructions improving their efficiency. The main 
idea in the reduction is to build a matrix from a DDH tuple which rank shifts 
from r to 2r, and then apply a hybrid argument to deal with the general case. 

On the other hand, the new reduction is optimal: We show the nonexistence 
of more efficient reductions in a wide class containing all the “natural” ones 
(i.e., black-box and algebraic). The result is twofold: There is no (natural) way 
to build a matrix which rank shifts from r to 2r + a for a > 0, and no hybrid 
argument can improve the logarithmic loss-factor obtained in the new reduction. 
Basically, the new reduction achieves the following result. 

(Informal) Theorem EJ For any 4, 4ti> r 2 suchthatl < rq < r 2 < min(4,4) 
there is a reduction of the DDH problem to the Rank problem for l\ x 4 matrices 
of rank either n orr 2, where the advantage of the problem solvers fulfil 

AdvRank(S,4,4,n,r 2 ;f} < |k>g 2 AdvDDH(£;t') 

and their running times t and t' are essentially equal. 

In particular, our reduction relates the DDH Problem to the hardness of telling 
apart l X £ full rank matrices from rank 1 matrices with a loss-factor of only 
log 2(4, instead of the factor l obtained in previous reductions. Moreover, the 
previous reductions are tight only for ranks n and r 2 such that r 2 = ri + 1, 
while our results show that there exists a tight reduction for rq < r 2 < 2n . 

At this point, it arises the natural question of whether a tight reduction exists 
for a wider range of the ranks rq and 7*2. However, we show the optimality of the 
new reduction by the following negative result. 

(Informal) Theorem For any 4, 4 , 7q,r 2 suchthatl < rq < r 2 < min (4, 4) 
and any ‘natural’ reduction R of DDH to the Rank problem, the advantages of the 
Rank problem solver A and the DDH solver 1Z([A]) fulfil 

AdvRank W [_4](£/,4,4>T'i)r2;i) > |"log 2 — j AdvDDH.4 (£;*') — e 

where the running times t, t' are similar and e is a negligible quantity. 

Here, ‘natural reduction’ basically means a black-box reduction which transforms 
a DDH tuple into a hidden matrix by performing only (probabilistic) algebraic 
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manipulations, which are essentially linear combinations of the exponents with 
known integer coefficients, depending on the random coins of the reduction. 

All generic reductions from computational problems based on cyclic groups 
fall into this category. Therefore, this result has to be interpreted as one can- 
not expect finding a tighter reduction for a large class of groups unless a new 
(non-black-box or not algebraic) technique is used. Nevertheless, falsifying this 
negative result would imply an improvement on the efficiency of the cryptosys- 
tems based on matrices, or even the discovery of a new reduction technique. 

The techniques used in the paper extend naturally to other “algebraic” prob- 
lems like the Decisional Linear (DLin) or the Decisional 3-Party Diffie-Hellman 
(D3DH) problems, also obtaining reductions with logarithmic complexity. Actu- 
ally, these reductions recently appeared in j3j and jS|- 

(Informal) Theorem |3 For any £±, £ 2 , 1 * 1 , r% suchthat2 < r% < r% < min(f?i,£ 2 ) 
there is a reduction of the DLin problem to the Rank problem for £\ x £2 matrices 
of rank either r\ orr 2 , where the advantage of the problem solvers fulfil 

AdvRank(5,^i,^2,D,?'2;t) < 1.71 log 2 — j-j AdvDLin(£; t') 
and their running times t and t' are essentially equal. 

(Informal) Theorem 0J Forany£i,£ 2 ,ri,r 2 suchthat2 <r\<r 2 < mm.{£\,£ 2 ) 
there is a reduction of the D3DH problem to the Rank problem for £\ X £2 matrices 
of rank either r\ orr 2 , where the advantage of the problem solvers fulfil 

AdvRank(£,^ 2 ,ri,r 2 ;f) < |"l.711 0g2 — AdvD3DH(£; t') 
and their running times t and t' are essentially equal. 

Negative results similar to Theorem 0 are also given, but in these two cases the 
reductions are shown to be optimal up to a constant factor of 1.71. 

Further Research. Some of the ideas and techniques used in the paper suggest 
that the problem of the optimality of certain type of reductions for a class of 
decisional assumptions can be studied under the Algebraic Geometric point of 
view. In particular, this could help to close the gap in the loss-factor between 
the reduction and the lower bound when reducing DLin or D3DH to Rank, and 
could made possible to obtain similar results for a broad class of computational 
problems. A second open problem is how the techniques and results adapt to the 
case of composite order groups, specially when the factorization of the order, or 
the order itself is unknown. 

Roadmap. The paper starts with some notation and basic lemmas, in Sectional 
Then the Rank Problem and the new reduction of DDH is presented in Sectional 
The optimality of the reduction is studied in Section 0 In the last section of the 
paper, the previous results are extended to other “algebraic” decisional problems 
like DLin or D3DH. 
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2 Notation and Basic Lemmas 

Let Q be a group of prime order q, and let g be a random generator of Q. For 
convenience we will use additive notation for all groups. In particular, 0 g denotes 
the neutral element in Q, whereas 1 g denotes the generator g. Analogously, xlg, 
or simply xg, denotes the result of g x , for any integer ieZ g . The additive nota- 
tion extends to vectors and matrices of elements in Q, in the natural way. That 
is, given a vector x = (x\, . . . , xf) £ Z ( q , we will write xg = ((x\ )g, . . . , (xf)g), 
and the same for matrices. Z g ^ lX ^ 2 denotes the set of all 1% x £-2 matrices, and 
Z q f ' lXf ' 2 ' r is used for the subset of those matrices with rank r. In the special 
case of invertible matrices we will write GL^(Z g ) = Z q xt ' t '. The sets of matrices 
with entries in Q, which we write Q fl x(:2 . g ilXt2 < r and GL^(£?), are defined in 
the natural way by replacing every matrix M by Mg. Notice that the sets are 
independent of the choice of the group generator 1 g . 

An element xg = xlg £ Q and an integer a £ Z q can be operated together: 
axg = (ax mod q)lg = ( ax)g = xag. These operations extend to vectors and 
matrices in the natural way. Therefore, for any two matrices A £ Z q 2 yJ ’ 2 and 
B £ Z/ 2xe3 , we have AgB = ABg = (AB)g. 

For convenience we will use the notation A(&B for block matrix concatenation: 



In addition, If and Of, y j. 2 respectively denote the neutral element in GL^(Z g ) and 
the null matrix in Z q ilXi2 . The shorthand Of = Ofxe is also used. Given a matrix 
A £ Z/ lX ^ 2 , the transpose of A is denoted by A T , and the vector subspace 
spanned by the columns of A is denoted by Span A C Z^ 1 , which dimension 
equals rank A. 

Uniform sampling of a set S is written as x Gr S. In addition, sampling 
from a probability distribution D which support is included in S is denoted by 
x <— D, while x <— A(a) denotes that x is the result of running a (probabilistic) 
algorithm A on some input a. 

As it is usual, a positive function / : Z+ — > M+ is called negligible if /(A) 
decreases faster than X~ c for any positive constant c. We denote this by /(A) £ 
negl(A). Similarly, /(A) > negl(A) denotes that /(A) is non negligible in A. 

Lemma 1. The following three natural group actions are transitive 0 

1. the left-action of GLf 1 (Z q ) on Z ? ^ lX ^ 2,£2 ; for £\ > 1 2 , defined by A ^ UA, 
where U £ GLf 1 ( Z ? ) and A £ Z/ lX ^ 2 ’ fe ; 

2. the right-action of GLf 2 (Z q ) on Z/ 1 ^ 2 ’^ 1 , for l\ < £ 2 , defined by Ah AV, 
where V £ GLf 2 ( Z q ) and A £ z/ 1 ^ 2 *, 

1 The action of a group G on a set A is transitive if for any a,b £ A there exists g £ G 
such that 6 = g ■ a. As a consequence, if g €r G then for any a £ A, g ■ a is uniform 

in A. 
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3. the left-right- action of GL^ (Z g ) x GLe 2 ( Z q ) on Z q ilXir,r , defined by A 
UAV, where U G GL tl { Z,), V G GL t ,( Z q ) and A G Z/* x< * 5r . 

Lemma 2 (Rank Decomposition). Given any matrix A G Z ? ^ lX ^ 2;r , there 
exist matrices L G Z q eiXr ’ r and R G Z 9 rx ^ 2 ’ r such that A = LR. 

3 The Rank Problem and the New Reduction of DDH to 
Rank 

We consider an assumption related to matrices, which is weaker than some well- 
known assumptions like the Decisional Difiie-Hellman, the Decisional Linear P] 
and the Decisional 3-Party Difiie-Hellman jdlfilhj assumptions. Given an (addi- 
tive) cyclic group Q of prime order q of binary length A, the Rank(t/ , i\ , £2 , T\ , ) 

problem informally consists of distinguishing if a given matrix in Z 9 ^ lX ^ 2 has ei- 
ther rank n or rank r 2 , for given integers ri < rg. The problem is formally 
defined through the following two experiments between a challenger and a dis- 
tinguisher A. 

Experiment ExpRank^(C?, i\, £ 2 , T\, r?) is defined as follows, for b = 0, 1. 

1. If b = 0, the challenger chooses M Gr Z g ^ lX ^ 2 ’ ri and sends Mg to A. 

If b = 1, the challenger chooses M Gr Z q ilX 3 ’ T2 and sends Mg to A. 

2. The distinguisher A outputs a bit b' G {0, 1}. 

Let be the event that A outputs b' = 1 in ExpRank^ £2 • T\ , r-2 ) . The 
advantage of A is defined as AdvRank^ £2 , r 1 , i’ 2 ) = |Pr[f? 0 ] — Pr[l?i]|. 
We can then define 

AdvRank(£ , t\ , £2 , D, , T 2 ; t) = max { AdvRank^ (Q , 1 1 , 12 , r\ , ra)} 

where the maximum is taken over all A running within time t. 

Definition 1 . The Rank(^, ^1,^2, ^1,^2) assumption in a group Q states that 
AdvRank(f/,^i,f? 2 ,ri,r 2 ;f) is negligible in A = log|f?| for any value oft that is 
polynomial in A. 

The Rank assumption appeared in recent papers under the names Matrix- 
DDH 0 and Matrix d-Linear m- However, the reduction given in the next 
proposition substantially improves the reductions previously known. Namely, 
the loss factor in the new reduction grows no longer linearly but logarithmically 
in the rank. 

Firstly, note that the Rank(£?, £1,12, problem is random self-reducible, 

since by Lemma Q] given Mo G Z g £lX ^ 2;fc , for random L Gr GL^ 1 (Z g ) and R Gr 
GL(j 2 (Z q ) the product LM 0 R is uniformly distributed in Z/ lXfe;fc . 

Lemma 3 . Any distinguisher for Rank( 0 ,^i,£ 2 , k — S,k), £1,^2 > 2, k > 2, 
1 < 8 < [|J can be converted into a distinguisher for the Decisional Diffie- 
Hellman (DDH) problem, with the same advantage and with essentially the same 
running time. 


J.L. Villar 


Proof. Given a DDH instance ( 1 , x, y, z)g, the DDH distinguisher builds the 
l\ X £2 matrix 


Mg ~ (jj °z) ®'" 0 (y z) ® Ik ~ 2S S®°((i -k)x(e 2 -k) g 


and submits the randomized matrix LMgR to the Rank ( 5 . l \ , £2. k — 5 , k) dis- 
tinguisher, where L e R GL^ 1 (Z g ) and R Sr GL^(Zq). Notice that if z = xy 
mod q then the resulting matrix is a random matrix in g^ x( 2-,k-S _ Otherwise, it 
is a random matrix in g ilX ^< k . □ 

Theorem 1 . For any £\ , ^2, 7*1, r*2 such that 1 < n < r 2 < min(.£i,f! 2 ) we have, 

AdvRank(e, 4 , 4 ,ri,r 2 ;t) < |"log 2 ^j AdvDDHft/; t') 

where t' = t + 0(1 i^ 2 (^i +£2)), taking the cost of a scalar multiplication in Q as 
one time unit. 

Proof. We proceed by applying a hybrid argument. Let us consider the sequence 
of integers {rij} defined by rij = n 2 ®, and let k be the smallest index such 
that rife > r 2 , that is k = [~log 2 r 2 — log 2 rq ] . Then define a sequence of random 
matrices {Mjg}, where Mj e R z q eiXe2 ’ ni for i = 0 , ...,k— 1 , and M*, e R 
Z 9 < i x<j,rj . For any distinguisher A.R a nk with running time upper bounded by t, 
let Pi = Pr[l <- A a a ^\A Mig)]. By Lemma El 

\p i+ l-pi\ = AdvRank ARank (g,e 1 ,l2,n i ,n i+1 ) < AdvDDH(C?; t') 

for i = 0 , . . . , k — 2 , and 

\pk~Pk- 1| = Adv Rank^4 Rallk (G , £\, £2,^-1, r 2) < AdvDDH(t?; t') 
Therefore, 

AdvRank^ „.„^ (G, £\ , 4 , n, r 2 ) = \pk~Po\ < \Pi ~ Po\ + ■ ■ ■ + \Pk - Pk-i\ < 
< k • AdvDDH(t/; t') 

which leads to the desired result. □ 


4 Optimality of the Reduction 

In this section we show that there does not exist any reduction of DDH to the 
Rank problem that improves the result in Theorem EJ unless it falls out of the 
class of reductions that we call black-box algebraic reductions. 
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4.1 Black-Box Algebraic Reductions 

Formally, a reduction 72 of a computational problem V\ to a problem V 2 effi- 
ciently transforms any probabilistic polynomial time algorithm A 2 solving P 2 
with a non-negligible advantage £2 into another probabilistic polynomial time 
algorithm A\ = 72[A.2] solving V\ with a non-negligible advantage e\. The reduc- 
tion 72 is called black-box if A\ is just a probabilistic polynomial time algorithm 
with oracle access to A 2 - 

In this paper we focus on the optimality of a reduction, measured in terms of 
the advantages of Ai and At- However, to be meaningful we need to add another 
requirement to the reduction: The running times of Ai and A 2 are similar. 
Otherwise, one can arbitrarily increase the advantage of Ai by repetition, thus 
making more than one oracle call to A 2 - We must add a qualifier and say that 
the reduction is then time-preserving black-box. However, for simplicity we will 
omit it and simply refer to black-box reductions. 

Following m, we say that 72 is algebraic with respect to a group Q if it 
only performs group operations on the elements of Q (i.e., group operation, 
inversion and comparison for equality), while there is no limitation in the opera- 
tions performed on other data types. Although the notion of black-box algebraic 
reduction is theoretically very limited, it captures all the ‘natural’ reductions, 
since all known reductions between problems related to the discrete logarithm in 
cyclic groups fall into this category. See [TT] for a deeper discussion on algebraic 
reductions and their relation with the generic group model. 

In the definition of an algebraic algorithm 72 it is assumed that there exists 
an efficient extractor that, from the inputs of 72 (including the random tape) 
and the code of 72, it extracts a representation of every group element in 72. ’s 
output as a multiexponentiation of the base formed by the group elements in the 
input of 72.. However, here we only require that for every value of the random 
tape of 72 there exists such representation, and it is independent of the group 
elements on the input of 72. More precisely, if g \ , . . . , g m G Q are the group 
elements in the input of 72 and h\,...,h n G Q are the group elements in the 
output, then for any choice of the other inputs and the random tape, there exist 
coefficients ay G Z q such that hi = ccafl'i + ...+ oti m g m , for i = t, . . . , n. Notice 
that this is true as long as H performs only group operations on the group 
elements. 

We insist in the possible existence of reductions using more intricate opera- 
tions other than the group operations defined in Q. However, there is little hope 
to be able to control the rank of the manipulated matrices, except for the trivial 
fact that a random matrix has maximal rank with overwhelming probability. 


4.2 Canonical Solvers 

In this paper, we consider only reductions 72 of some decisional problem (like 
DDH) to the Rank problem (say Rank(f?,f’i,f! 2 ,ri,r 2 )). Therefore, in a (time- 
preserving) black-box reduction, having oracle access to a solver A 2 of Rank 
exactly means that 72 computes some matrix in , and uses it as input of 
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Ai, then obtaining a bit b' £ {0, 1} as its output. Therefore, 1Z is nothing more 
than a way to obtain a matrix from a DDH instance by an algebraic function. 

As Rank problem is random self-reducible, one can consider the notion of a 
canonical solver A for Rank( 5 ,^i,^ 2 ,?'i,?' 2 )- In a first stage, a canonical solver, 
on the input of a matrix Mg £ Q tx , computes the randomized matrix Mg = 
LMgR for randomly chosen L £ GLf l (Z q ) and R £ GL( 2 (Zq), and then uses 
it as input of the second stage. Observe that Mg and Mg have always the 
same rank, and they are nearly independent. Indeed Mg and Mg conditioned 
to any specific value of the rank r are independent random variables, and Mg is 
uniformly distributed in Q l lX ^ 2;r . 

Moreover, for any solver A of Rank (Q, t\ , £%, r% , rp) we build a canonical solver 
A from A with the same advantage, by just inserting the initial randomiza- 
tion step. As a consequence, to obtain a negative result about the existence of 
black-box reductions of some problem to Rank(t/, n.rj), we only need to 
consider how the reduction works for canonical solvers of Rank(t/, £\ , £%,ri . r^)- 

Finally, it should be noticed that a canonical solver is completely character- 
ized by a probability vector p A = (pA,i)iez+, where PA,i = Pr[l <— A(Mg) : 
Mg €r Q fl x<?2;t ]. The advantage of a canonical solver is then AdvRank^ = 
\PA,r 2 ~ PA,n\- Dealing with all canonical solvers of Rank(C/,fi, £i,r\ T2) means 
considering all possible probability vectors p A such that \pa,t 2 ~ PA,n | is non- 
negligible. 


4.3 More Linear Algebra 

Let us see the implications of restricting the reductions to be algebraic. Since here 
we reduce the decisional problem DDH to the Rank (5, £ 1 , £ 2 , ri, r 2 ) problem, the 
reduction 1Z will receive as input either a 0-instance (i.e., (1 g,xg,yg,xyg)) or a 
1 -instance (i.e., (1 g,xg,yg, ( xy+s)g )) of the decisional problem (where x, y, s €r 
Z q). In spite of the instance received, 1Z will compute a matrix Mg £ g (lX ^ that 
depends ‘algebraically’ on the input group elements. Therefore, for any value of 
the random tape of 1Z there exist matrices Bi , B 2 , - 63 , B 4 £ such that 

M = Bi + xB ‘2 + yBs + (xy + s)B 4 . where either s = 0 or s Gr Z q , depending 
on the type of instance received by 1Z. 

Therefore, we need some properties of the sets of matrices that are linear 
combinations of some fixed matrices with coefficients that are multivariate poly- 
nomials. The following lemma informally states that matrices in a linear variety 
of r L q y ' t ' (of any dimension) are invertible with either zero or overwhelming 
probability. 

Lemma 4. Let M be a coset of a Z q -vector subspace ofZ/ x *, that is, there 
exist matrices A, B \, . . . , B^ £ r L q x<L for some integer k such that M. = {A + 
aq-Bi + . . . + XkB k | x\, . . . ,Xk € Z,}. If GL e (Z q ) n M ^ 0 then, 

\GLz(Z q )GM\ 1 £ 

V M = j-j-TT >1 ~ 

\M\ q- 1 
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Proof. 0 Let us choose A G GL^(Z 9 ) D M and let {B \ , . . . , B k } be a base of 
the vector space M. — A. In any line C C M. containing A there can be at 
most I matrices M G £ such that rank M < I (i.e., detM = 0). Indeed, for 
any line £ there is a nonzero vector x = (xi, . . . , x k ) G Z£ such that £ = 
{A+ fj,(xiBi + . . ,+x k B k ) | fj, G Z q }. Therefore the polynomial equation det(T + 
[j,(xiBi + . . ,+x k B k )) = 0, which is equivalent to Q x (p) = det(Ie+ ^(xiBiA -1 + 

. . . + x k B k A~ 1 )) = 0, has at most I roots because (^(O) = 1 and = 

det(A£ + xiBiA~ x + . . . + x k B k A~ l ) = 0 if and only if A is an eigenvalue of 
aq-BiAl -1 + ... + x k B k A -1 . Finally, since there are exactly |PZg -1 | = q q ~^ 
different lines in M. containing A, 

q k q - 1 

as k is the dimension of the vector space M — A, and then |Ad| = q k . □ 

This lemma can be easily generalized to parametrical subsets of linear vari- 
eties by replacing each variable x.j , j = t, . . . , k, by a multivariate polynomial 
Pj (j/i , . . . , y n ) G Z 9 [tji . ... . y n \ (or simply, M is now the range of a multivariate 
polynomial with matrix coefficients). Here we cannot ensure that the mapping 
between the parameter vector y = (yi , . . . , y n ) and the matrices in M is one- 
to-one. Therefore we will define vm as the probability of obtaining a full-rank 
matrix when y Gr Z” is sampled with the uniform distribution. 

Lemma 5. Let M be a subset of Z defined as M. = {pi{y)B\ + . . . + 
Pk(y)B k | y G Zg} ; where pi(y), ■ ■ ■ ,Pk(y) € Z q [y \ are multivariate polynomials 
of total degree at most d, and B \, . . . , B k G Z g ^ for some integer k. If GLg( Z g )fl 
M 7^ 0 then, 

v M = Pr [M G GL e {Z q ) : M = pi(y)B 1 + . . .+p k (y)B k , y Gr Z£] > 

„ Id q n — 1 , Id 

> 1 T “ ^ > 1 7 

q - 1 q n q- 1 

Proof. The proof is similar, but now we choose A = p\ (y 0 )B\ + . . . +p k ( y 0 )B k G 
GL#(Z g ) fl M. and define the new polynomials qi(z) = Pi(y 0 + z) — Pi(y 0 ) for 

1 = 1, . . . k. Now, M \ { A } is partitioned into subsets £* = { A + rp (pz)Bi + 

. . . + q k (pz)B k ) | p G Z^}, where z G Z” \ { 0 }, each one containing at most 
Id singular matrices, since the polynomial Q z (p) = det(/^ + qi(pz)BiA~ 1 + 

. . . + q k (pz)B k A~ 1 ) is nonzero (as Q z ( 0) = 1), and it has degree at most dl. 
Finally, the claimed inequality follows from the fact that there are ( q n — l)/(g— 1) 
different subsets £* . □ 

The above lemmas refer only to invertible matrices but a similar result applies 
to (even rectangular) matrices with respect to a specific value of the rank. 

2 This lemma and the following one can alternatively be proved by using the Schwartz 
lemma H3 (also referred to as Schwartz- Zippel lemma). 
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Lemma 6. Let M be a subset of Z g ^ lX ^ 2 defined as Ad = (pi(y)Bi + . . . + 
Pk(y)B k | y £ Z g } ; where pi(y), ■ ■ ■ ,p k {y) € Z g [y] are multivariate polynomials 
of total degree at most d, and Bi,...,B k £ Z g ^ lX ^ 2 for some integer k. If r m = 
max m£ x rank M then, 


v >_\4 = Pr[rankM = r r 


M = pi(y)B 1 + . . . +Pk(v)Bk, !/6rZJ]>1- 


r m d 
q- 1 


Proof. We just apply the previous lemma to a projection of the set M. Firstly 
choose Mo £ Ai such that rank Mo = r m and find matrices L £ Z q rmXil ;rm and 
R £ Z (; /?2XT ' m ’ r "‘ such that rank LMqR = r„, , that is LMqR £ GL rm (Z g ). This 
matrices are really easy to build, since by LemmaEI there exist Lq £ Z g ^ lXrm ’ r ”* 
and R 0 £ Z g r ”* x 2;rv " such that Mo = LqR.q. Therefore, we take any L such that 
LLq £ GL rm (Z g ). For instance, take L as a the all-zero matrix and put r m ones 
in its main diagonal, in positions corresponding to r m linearly independent rows 
of L 0 . We similarly proceed with R 0 and R. 

Now, the projected set Ai' = {LMR \ M £ Ai} fulfils the conditions of 
Lemma El and it contains at least one invertible matrix LMqR. Thus, 


V M‘ = Pr[M' € GL r . m (Z g ) : M’ = L(p 1 (y)B 1 + ... +p k (y)B k )R, y e R Z£] > 
q-1 

Moreover, since rank(LM R) < rank M <r m for all M £ Ai, then rank(LM R) = 
r m implies rankM = r m , and 

£Vm 

Pr [rankM = r m : M = pi(y)B 1 + . . ,+p k (y)B k , y G R Z^] > v M > > 1 - 

□ 


This lemma basically says that in a set Ai defined and sampled as above the 
matrices have a specific rank (the maximal rank in the set) with overwhelm- 
ing probability, and ranks below the maximal one occur only with negligible 
probability. 


4.4 The Case of DDH 

Now let us consider the specific case of the sets Mo and Mi generated by a 
black-box algebraic reduction 1Z from a DDH 0-tuple or 1-tuple, respectively, for 
a fixed random tape of 1Z. More precisely, .Mddh-o = [Bo + xB\ + yB- 2 + xyB^ | 
x,y £ Z g }, while A^ddh-i = {Bo + xBi + yB 2 + ( xy + s)B 3 \ x,y,s £ Z g }, for 
some matrices Bo,Bi,B 2 ,B 3 £ Z f / lX ^ 2 that could depend on the random tape. 
Let r m o and r m i be the maximal ranks respectively in A4 ddh-o and A1 ddh-i ■ 
Since the former is a subset of the latter, r m o < r TO i. In addition, it is clear 
that rank Ho < r m o, but one can also prove that rank B 3 < r m o and therefore 
r m i < 2r m o, as claimed in the following lemma. 
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Lemma 7. Let r m o and r m \ be the maximal ranks respectively in M. ddh-o and 
Mddh-i- Then r m o < r m i < 2r m o- 


Proof. The left inequality is trivial, as mentioned above. To prove the right 
one we firstly use Lemma El to show that rank S 3 < r m o- Indeed, the sub- 
set -M'nDfj.o = {S 0 + xBi + yB 2 + xyB 3 \ x, y G Z*} differs from Ad ddh-o 
in that a negligible fraction of it has been removed. Therefore, the probabil- 
ity distributions on both sets (induced by uniformly sampling x and y) are 
statistically close. Since for all x, y G Z* , rank(So + xB\ + yB 2 + xyB 3 ) = 
rank(A-S 0 + ^Si + ^S 2 + S 3 ), and the inversion map x H- l/x is a bijection 
in Z* , the probability distributions of the ranks in Ad {, DH _ 0 and in Ad DDH _ 0 = 
{S 3 + xB 2 + yBi + xyB 0 \ x,y € Z* } are identical. Therefore, matrices in 
Ad ddh-o = { S 3 + xB 2 + yB\ + xijBq \ x.y £ Z 9 } have rank r m 0 with overwhelm- 
ing probability. Moreover, by Lemma El i"mO is precisely the maximal rank in 
Ad ddh-o and then, rank S 3 < r m o 0 

Finally, observe that for any M G .Mddh-i, M = Bo+xBx+yB 2 +(xy+s)B 3 = 
(S 0 + xB\ + yB 2 + xyB 3 ) + sS 3 and rank M < rank(S 0 + xB\ + yB 2 + xyB 3 ) + 
rank(sS 3 ) < 2r m o, because S 0 + xBi + yB 2 + xyB 3 G Ad ddh-o- □ 


The previous discussion deals with a fixed arbitrary random tape of the reduc- 
tion 1Z. However, the overall performance of TZ depends on the aggregation of 
the contributions of all possible values of the random tape. Technically, given a 
particular canonical solver A of Rank (5, £j , l 2 . r \ , r 2 ), described by its probabil- 
ity vector p_A as defined in Section E3 the advantage of 1Z[A] can be computed 


as 

AdvDDH^(S) = 



i| 

(TTO.r- - 7T1 ,r)PAr 


= |(7ro-7Ti)-p^| 


where 


and 


7To,r- = Pr[rankM = r : M <- 1Z(lg, xg, yg, xyg), x, y Gr Z q ] 


7 Ti ir = Pr[rankM = r : M <— TZ(lg, xg, yg, (xy + s)g), x, y, s Gr Z q ] 

For convenience, we also introduce the cumulative probabilities IIf ) r = Ya=o n b,i, 
b G {0, 1}. 

Since the reduction 1Z must work for any successful solver A, for every prob- 
ability vector p „ 4 such that \pAn ~ PAn \ = AdvRank^ (Q,£i,£ 2 ,ri,r 2 ) is non- 
negligible, the advantage AdvDDH TC [_ 4 ] (Q) must be also non-negligible. This 
implies the existence of a > negl(A) such thaf0 

Ko.r - 7Ti,r| € negl(A) Vr £ {n, r 2 } 

3 A very similar trick also shows that rank Si and rank S2 ar at most r m o- However, 
it is not clear how to extend this argument to arbitrary multivariate polynomials. 

4 To prove it, consider the fact that there cannot exist any probability vector pa 
orthogonal to 7 Tq — 7Ti such that \pAn — PAt 2 I > negl(A). 
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ko.n - 7Tl,n| = a 

Ko,r 2 - TTl.ra j = OL ± negl(A) 


(1) 


Moreover, 

AdvDDH K[ _ 4 ] (G) < \PAr i - VAt -2 I oi + negl(A) = 

= aAdvRank^(£/,i?i,^ 2 , r 'i,?' 2 ) + negl(A) 

All that remains is to find an upper bound of the reduction loss-factor a. 

By Lemma El we know that for every value of the random tape, Pr[rankM < 
r m b : M <— At DDH-b] G neglA for b G {0,1}, and by definition of r m b, 
Pr[rankM < r m b : M M. DDH-b] = 1. Therefore, considering all values of 
the random tape of 7H0 

n b ,i = Pr [r mb < i\ + negl(A) b e {0, 1} (2) 

where now r„ jn and r m i are random variables. By Lemma 0 r ni.o < r m i < 2r m o, 
which implied Pr[r m i < i] < Pr[r m o < i] < Pr[r m i < 2 i], for arbitrary i, and 
by ©, 

II lt i - negl(A) < iT 0 ,i < 7Ti,2i + negl(A) (3) 

Now, using left hand side of 0 for i = n we get JIi,n < 17o,n + negl(A), and 
combined with 0, we obtain 7ro,ri = 7Ti, ri + a and 7Ti iT . 2 < wo,r 2 + a + negl(A). 
In addition, for any i such that r\ <i<r 2 , 

n 0 ,i = n hi + a ± negl(A) (4) 

Let us assume now that r -2 > 2 k r-\ for some k > 1. Then, applying the right 
hand side of 0 and 0 , 


^o, 2 k n = n 12 k ri + a ± negl(A) > n 0 2 k-i ri + a — negl(A) 
and by induction, 

IT 0 , 2 k n > n 0 ,ri + ka— negl(A) > (A: + 1 )a — negl(A) 

where 0 is used again in the last step. 

Finally, since the leftmost stun is upper bounded by 1, 

< 1 + negl(A) 
a ~ k + 1 

for any k < log 2 r 2 — log 2 r\ . Therefore, 

a< 1 + negl(A) 
riog 2 r 2 - log 2 ri] 

The above discussion proves the following theorem. 

6 If r m b < i then rankM < i with probability 1. Otherwise, rank M < i only with 
negligible probability. 

6 Observe that r m 1 < * => r m o < i r m 1 < 2 i. 
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Theorem 2. For any i\,t%,r\,r% such that l <r\ < r 2 < min^i,.^) and any 
time-preserving black-box algebraic reduction'll o/DDH(5) to the Rank((?,£i, 
^ 2 >Ojf 2 ) problem, any canonical Rank solver A and the corresponding DDH 
solver 72.Q./4]) fulfil 

AdvRank K[ ^(a,4^2,ri,r2;i) > |"log 2 AdvDDH^ (£; t') - negl(A) 
where the running times t, tl are similar. 

□ 


5 Reductions of Other Decisional Problems 

We consider now other well-known computational problems, namely the Deci- 
sional Linear (DLin) P and the Decisional 3-Party Diffie- Heilman (D3DH) )3lbl9j 
problems. 

The techniques described above can be applied to these problems by defining 
a suitable basic matrix block M (of suitable size) where the problem instance 
is embedded, and use as many copies of it as possible. More precisely, we call 
algebraic to any decisional problem (such as DDH, DLin or D3DH) in which the 
problem instance is defined by a tuple of elements in a (cyclic) group which dis- 
crete logarithms fulfil or not a specific algebraic equation. The way the problem 
instance is embedded into the matrix M is by rewriting the algebraic equation 
as det M = 0. 


5.1 The Decisional Linear Problem 

The Decisional Linear problem consists on distinguishing between the distribu- 
tions {xg,yg, Zg,tg, (x~ 1 Z + ?/ _1 i)g) G G 5 and (xg,yg, Zg,tg,Ug) G G 5 , where 
x,y,z,t,u Gr Z f; are chosen independently and uniformly at random. More for- 
mally, we consider the following two experiments between a challenger and a 
distinguisher A. 

Experiment ExpDLin^ (G) is defined as follows, for b = 0, 1. 

1. The challenger chooses random x,y,z,t,u Gr Z g . If b = 0, the challenger 
sends the tuple (1 g,xg,yg,zg,tg,(x~ 1 z + y~ 1 t)g) G G 6 to A. Otherwise, it 
sends the tuple (1 g,xg,yg, zg,tg,ug) G G 6 ■ 

2. The distinguisher A outputs a bit b' G {0, 1}. 

Let fib be the event that A outputs b' = 1 in ExpDLin^(^). The advantage of 
A is AdvDLin^ (G) = | Pr[O 0 ] - Pr[f?i]|. We can then define AdvDLin(£?;t) = 
max.4 {AdvDLin .4 (£?)}, where the maximum is taken over all A running within 
time t. 

Definition 2 (DLin). The Decisional Linear assumption in a group G states 
that AdvDLin((?; t ) is negligible in X = log \G\ for any value oft that is polyno- 
mial in A. 
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Lemma 8. Any distinguisher for Rank^GjG, k — S,k), t\,t% > 3, k > 3, 
1 < 5 < [|J can be converted into a distinguisher for the Decisional Linear 
(DLin) problem, with the same advantage and running essentially within the 
same time. 

Proof. Given a DLin instance (1, x,y, z,t,u)g the DLin distinguisher builds the 
t\ X £2 matrix 


M g = 






®h-3Sg © 0 ( m -k)x(n-k)g 

Q 


and submits the randomized matrix LMgR to the Rank (5, G , £ 2 - k — 6, k) dis- 
tinguisher, where L Gr GL^ 1 (Z g ) and R Gr GL^Zg). Notice that if u = 
x~ 1 z + y~ 1 t mod q then the resulting matrix is a random matrix in g^^k-S 
Otherwise, it is a random matrix in Q eixe2 < k . □ 

Theorem 3. For any &utz r f\,r 2 such that 2 < n < r 2 < min (G,G), 

AdvRank(e,G,G,ri,r 2 ;t) < \ log ^ r ^ ~ lo ^ 3ri ~ 2 ) j AdvDLin(S;f / ) < 

| log 3 — log 2 | 

< |"l.711og 2 AdvDLin(5; t') 

Proof. We can apply a hybrid argument similar to the one used in Theorem^ Let 
us consider the sequence of integers {rij} defined by the recurrence no = r i and 
ni + 1 = l , and let k be the smallest index such that n k > r 2 . Then define a 
sequence of random matrices {Mig}, where Gr TL^ 1 for i = 0, . . . , k— 1, 
and Mfc Gr ’Z q f ' lXf ' 2 ’ r2 . For any distinguisher ^R an k with nmning time upper 
bounded by t, let Pi = Pr[l <— A rt^v iMnj)}. By LemmaEl 

|pi+i — Pi\ — AdvRank^ Rallk (t/,G)G,ni,«i+i) < AdvDLin(t/; t') 
for i = 0, . . . , k — 2, and 

| Pk ~Pk- 1 | = AdvRank^ Rank (£,G,G,nfc_i,r 2 ) < AdvDLin(£;t') 
Therefore, 


AdvRank^ Rank (^,G,G,ri,r 2 ) = \p k - Po\ < \Pi ~ Po\ + • ■ • + \Pk~Pk-i\ < 
< k • AdvDLin(£?;t') 

On the other hand, as then n k > (D* ( r i — |) which implies that 
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The optimality of the reduction presented above can be analyzed with the same 
tools described in Section 0 but adapting some parts of Subsection 14.41 First of 
all, we can describe the O-instances and the 1-instances for the DLin problem in a 
slightly different way. Namely, .MDLin-o = { Bi + xB 2 + yB :i + xaB 4 + y0B 5 + (a + 
P)B 6 | x,y,a,P G Z ? }, while AdDLin-i = {B 1 +xB 2 + yB 3 + xaB 4 ,+yPB 5 + (a + 
P + s)B 6 | x,y,a,/3,s G Z 9 }, for some matrices Bi, B 2 , B 3 , B 4 , B$, B§ G Z, lX 2 
that could depend on the random tape of the reduction. By a similar trick one 
can manage to reprove Lemma Q also for DLin and the rest of the analysis 
works equally well. The trick in this case is excluding the case a + ft = 0 (which 
affects to a negligible fraction of the matrices) and then using a more elaborate 
bijection which transforms B\ + xB 2 + yB 3 + xaB 4 + yfiB 5 + (a + fi)B e into 
7S1 + x'yB 2 + y^B 3 + xajBi + y( 1 — a~/)B 5 + B 6 , where 7 = l/(a + $). 

However, the logarithmic expression (which is identical to the one in Theo- 
rem|2I) for the maximal loss-factor in the reduction is different from the loss-factor 
in the above reduction, leaving a gap that could mean that a better ‘natural’ 
reduction is still possible. Nevertheless, the authors think that a more detailed 
analysis of the maximal ranks r m 0 and r m \ could be possible, which would im- 
prove the negative result obtained here. 


5.2 The D3DH Problem 

The Decisional 3-Party Diffie-Hellman (D3DH) problem [3lfil9j consists in telling 
apart the two distributions {xg,yg,zg,(xyz)g) G f? 4 and {xg,yg, zg,tg) G f? 4 , 
where x, y, z, t Gr Z, ; are chosen independently at random. The problem is for- 
mally defined through the following two experiments between a challenger and 
a distinguisher A. 

Experiment ExpD3DH^(£) is defined as follows, for b = 0, 1. 

1. The challenger chooses random x, y, z, t Gr Z f; . If b = 0, the challenger sends 
the tuple (1 g,xg,yg,zg,(xyz)g) G Q 5 to A. Otherwise, it sends the tuple 
(1 g,Xg,yg,Zg,tg) G G 5 . 

2. The distinguisher A outputs a bit b' G {0, 1}. 

Let f2b be the event that A outputs b’ = 1 in ExpD3DH^(£?). The advantage 
of A is AdvD3DH_4(t/) = | Pr[f? 0 ] — Pr[f?i]| and we define AdvD3DH(0,t) = 
max.4 {AdvD3DH^ (£?)}, where the maximum is taken over all A running 
within time t. 

Definition 3. The Decisional 3-Party Diffie-Hellman assumption in a group Q 
states that AdvD3DH(£?,f) is negligible in A = log|C?| for any value oft that 
is polynomial in A. 

Similar to the Decisional Linear problem, it turns out that the D3DH problem 
is easier than the Rank problem. 
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Theorem 4. For any £i,£ 2 ,ri,r 2 such that 2 <r% < r — 2 < min^i,^), 


AdvRank(^,A,4,n,r 2 ;i) < P° g(3 ^ 2) 0 l0 f^ — ^1 AdvD3DH(£7; t') < 

log 3 — log 2 



Proof. The proof only differs from the proof of Proposition 0 in the 3x3 blocks 
built from a problem instance, in the proof of LemmaEI Indeed, given the D3DH 
instance (1 ,x,y,z,t)g the matrix 



has rank 2 or 3 depending on whether t = xyz mod q. 


□ 


The analysis of the optimality of this reduction is comparable to the case of the 
Decisional Linear problem. Here the sets of matrices are A4d3dh-o = {Bi+xB 2 + 
yB 3 + zB± + xyzB 5 \ x,y,z G Z q } and A4 D 3dh-i = {-Bi + xB 2 + yB 3 + zB 4 + 
(xyz + s)B 5 | x,y,z,s G Z q }, for some matrices B\,B 2 ,B 3 ,B±,B 3 G 
that could depend on the random tape of the reduction. The same gap between 
the constructive and negative results is obtained. 

5.3 Further Generalizations 

The ideas presented before, both the constructive and the negative results for 
reductions of some decisional problems to the Rank problem seems to be easily 
applicable to a wide class of decisional problems. On the one hand, the con- 
struction of a reduction to the Rank problem only needs a way to encode the 
difference the O-instance and the 1-instance of the problem as the determinant of 
a square matrix M built up from the group elements in the instances. Typically 
a O-instance corresponds to det M = 0. Following this approach, it is straight- 
forward to obtain efficient reductions for instance for the family of Decisional 
r- Linear Problems, with arbitrary r. 

On the other hand, the negative results about the existence of efficient reduc- 
tions also rely on algebraic considerations, mainly related to the sets M. which 
can be seen as special affine algebraic varieties. It is an open problem to obtain 
a description of a wide class of algebraic decisional problems for which a general 
negative result can be derived. 

In this paper, only prime order groups are considered. However, it would be 
interesting to investigate whether the techniques presented here can be applied to 
composite order groups, where the matrices involved in the analysis are defined 
over rings, and this can introduce some extra difficulties to deal with notions 
like the rank and the random self-reducibility. 
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Abstract. In the auxiliary input model an adversary is allowed to see 
a computationally hard-to-invert function of the secret key. The auxil- 
iary input model weakens the bounded leakage assumption commonly 
made in leakage resilient cryptography as the hard-to-invert function 
may information-theoretically reveal the entire secret key. In this work, 
we propose the first constructions of digital signature schemes that are 
secure in the auxiliary input model. Our main contribution is a digi- 
tal signature scheme that is secure against chosen message attacks when 
given an exponentially hard-to-invert function of the secret key. As a sec- 
ond contribution, we construct a signature scheme that achieves security 
for random messages assuming that the adversary is given a polynomial- 
time hard to invert function. Here, polynomial-hardness is required even 
when given the entire public-key - so called weak auxiliary input secu- 
rity. We show that such signature schemes readily give us auxiliary input 
secure identification schemes. 


1 Introduction 

Modern cryptography analyzes the security of cryptographic algorithms in the 
black-box model. An adversary may view the algorithm’s inputs and outputs, but 
the secret key as well as all the internal computation remains perfectly hidden. 
Unfortunately, the assumption of perfectly hidden keys does not reflect prac- 
tice where keys frequently get compromised for various reasons. An important 
example is side-channel attacks that exploit information leakage from the imple- 
mentation of an algorithm. Side-channel attacks do not only allow the adversary 
to gain partial knowledge of the secret key thereby making security proofs less 
meaningful, but in many cases may result in complete security breaches. 
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In the last years, significant progress has been made within the the- 
ory community to incorporate information leakage into the black-box model 
(cf. QI3IEllHinilI2IZI3l2I] and many more). To this end, these works develop 
new models to formally describe the information leakage, and design new schemes 
that can be proven secure therein. The leakage is typically characterized by a 
leakage function h that takes as input the secret key sk and reveals h( sk) — the 
so-called leakage — to the adversary. Of course, we cannot allow h to be any 
function as otherwise it may just reveal the complete secret key. Hence certain 
restrictions on the class H of admissible leakage functions are necessary. 

With very few exceptions (outlined in the next section) most works assume 
some form of quantitative restriction on the amount of information leaked to an 
adversary. More formally, in the bounded leakage model, it is assumed that H 
is the set of all polynomial-time computable functions h : {0. l}l sk l — >• (0, 1} A 
with A <C |sk|. This restriction can be weakened in many cases. Namely, in- 
stead of requiring a concrete bound A on the amount of leakage, it often suffices 
that given the leakage h{ sk) the secret key still has a “sufficient” amount of 
min-entropy left PE1|^|22|- This so-called noisy leakage models real-world 
leakage functions more accurately as now the leakage can be arbitrarily large. 
Indeed, real-world measurements of physical phenomenons are usually described 
by several megabytes or even gigabytes of information rather than by a few bits. 

While security against bounded or noisy leakage often provides a first good 
indication for the security of a cryptographic implementation, in practice leakage 
typically information theoretically determines the entire secret key |25j- The 
only difficulty of a side-channel adversary lies in extracting the relevant key 
information efficiently. Formally, this can be modeled by assuming that % is 
the set of all polynomial-time computable functions such that given h(sk) it 
is still computationally “hard” to compute sk. Such hard-to-invert leakage are 
a very natural generalization of both the bounded leakage model and the noisy 
leakage model, and is the focus of this work. More concretely, we will analyze the 
security of digital signature schemes in the presence of hard-to-invert leakage. 
We show somewhat surprisingly that simple variants of constructions for the 
bounded leakage setting 0IHII3EIE1 also achieve security with respect to the 
more general class of hard-to-invert leakage. 


1.1 The Auxiliary Input Model 

The auxiliary input model of Dodis, Kalai and Lovett nn introduced the no- 
tion of security of cryptographic schemes in the presence of computationally 
hard-to-invert leakage. They propose constructions for secret key encryption 
with IND-CPA and IND-CCA security against an adversary who obtains an 
arbitrary polynomial- time computable hard-to-invert leakage h( sk). Security is 
shown to hold under a non-standard LPN-related assumption with respect to 
any exponentially hard-to-invert function. We say that h is an exponentially 
hard-to-invert function of the secret key sk, if there exists a constant c > 0 such 
that, for sufficiently large k = |sk|, any PPT adversary A has probability of at 
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most 2~ ck in inverting h( sk). Notice that the result gets stronger, and the class 
of admissible leakage function gets larger, if c is smaller. 

In a follow-up paper, and most relevant for our work, Dodis et al. 0 study 
the setting of public key encryption. They show that the BHHO encryption 
scheme 0 based on DDH and variants of the GPV encryption scheme M based 
on LWE are secure with respect to auxiliary input leakage. All their schemes re- 
main secure under sub-exponentially hard-to-invert leakage (for a weaker notion 
that we discuss below 0 achieves security with respect to polynomial hard-to- 
invert leakages). That is, a function h is sub-exponentially hard-to-invert if there 
exists a constant 1 > c > 0 such that /i(sk) can be inverted with probability at 
most 2~ k °. 

In the public key setting, some important subtleties arise which are also im- 
portant for our work. 

1. We shall allow the leakage to depend also on the corresponding public key 
pk. One approach to model this is to let the adversary adaptively choose 
the leakage function after seeing the public key pk 0. An alternative that 
is taken in the work of Dodis et al. 0 assumes admissible leakage functions 
h : {0, l}l sk l+lP k l — y {0, 1}*, where it is hard to compute sk given fi(pk, sk). 

2. The public key itself may leak information about the secret key. To illustrate 
this, consider a contrived scheme, where the public key pk contains the first 
k/2 bits of the secret key in clear. Suppose we want to prove security for 
leakage functions h with the property that given /i(pk, sk), it is at least 
2 ~k /2 j larc [ compute the secret key sk. Given the public key pk and such 
leakage that reveals the last k/2 bits of the secret key, the scheme from 
above gets completely insecure. To handle this issue, Dodis et al. propose 
a weaker notion of auxiliary input security, which assumes that a function 
is an admissible leakage if it is hard to compute the secret key even when 
given the public key. 

For ease of presentation, we mainly consider in this work this weaker notion 
of auxiliary input security. As shown in 0 , when the public key is short this 
notion implies security for functions h solely under the assumption that given 
h( pk, sk) it is computationally hard to compute sk (i.e., without defining hardness 
with respect to pk). The underlying idea is that the public key can be guessed 
within the proof, which implies that the hardness assumption gets stronger when 
applying this proof technique. Specifically, security is obtained in the presence of 
exponentially hard-to-invert leakage functions. We further note that this weaker 
notion already suffices for composition of different cryptographic schemes using 
the same public key. For instance, consider an encryption and signature scheme 
sharing the same public key. If the encryption scheme is weakly secure with 
respect to any polynomially hard-to-invert leakage function^ then the scheme 
remains secure even if the adversary sees arbitrary signatures, as these signatures 


A function h is polynomially hard-to-invert auxiliary information, if any probabilistic 
polynomial-time adversary computes sk with negligible probability, given the leakage 
h( sk, pk). 
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can be viewed as hard-to-invert leakage. The opposite may not trivially hold for 
signature schemes that are secure with respect to (sub) exponentially hard-to- 
invert leakages. 

Recently, Brakerski and Goldwasser [5| and Brakerski and Segev |0| proposed 
further constructions of public key encryptions secure against auxiliary input 
leakage. In the former, the authors show how to construct a public key encryption 
scheme secure against sub-exponentially hard-to-invert leakage, based on the QR 
and DCR hardness assumptions. In the latter, the concept of security against 
auxiliary input has been introduced in the context of deterministic public key 
encryption, and several secure constructions were proposed based on DDH and 
subgroup indistinguishability assumptions. 


1.2 Our Contributions 

Despite significant progress on constructing encryption schemes in the auxiliary 
input model, the question of whether digital signature schemes can be built 
with security against hard-to-invert leakage has remained open so far. This is 
somewhat surprising as a large number of constructions for the bounded and 
noisy leakage setting are known pHll%llH HT7HTTl| . In this paper, we close this gap 
and propose the first constructions for digital signature schemes with security in 
the auxiliary input model. As a first contribution of our work, we propose new 
security notions that are attainable in the presence of hard-to-invert leakage. 
We then show that constructions that have been proven to be secure when the 
amount of leakage is bounded, also achieve security in the presence of hard-to- 
invert leakage. In a nutshell, our results can be summarized as follows: 

1. As shown below, existential unforgeability is unattainable in the presence of 
polynomially hard-to-invert leakage. We thus weaken the security notion by 
focusing on the setting where the challenge message is chosen uniformly at 
random. Our construction uses ideas from [dj to achieve security against 
polynomially hard-to-invert leakage when prior to the challenge message the 
adversary only has seen signatures for random messages. Such schemes can 
straightforwardly be used to construct identification schemes with security 
against any polynomially hard-to-invert leakage (cf. Sections Id. 21) . 

2. We show that the generic constructions proposed in EUHE] achieve the 
strongest notion of security, namely existentially unforgeable under chosen 
message attacks, if we restrict the adversary to obtain only exponentially 
hard-to-invert leakage. As basic ingredients these schemes use a family of 
second preimage resistant hash functions, an IND-CCA secure public key 
encryption scheme with labels and a reusable CRS-NIZK proof system. For 
our result to be meaningful, we require both the decryption key and the 
simulation trapdoor of the underlying encryption scheme to be short when 
compared to the length of the signing key for the signature scheme (cf. Sec- 
tion Id. Ml) . 

3. We show an instantiation of this generic transformation that satisfies our 
requirements on the length of the keys based on the 2-Linear hardness 
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assumption in pairing based groups, using the Groth-Sahai proof system [El 
(we refer the reader to the full version). 

We elaborate on these results in more detail below. 

Polynomially Hard-to-Invert Leakage and Random Challenges. Impor- 
tantly, security with respect to polynomially hard-to-invert leakage is impossible 
if the message for which the adversary needs to output a forgery, is fixed at the 
time the leakage function is chosen. This is certainly the case for the standard 
security notion of existential unforgeability. One potential weakening of the se- 
curity definition is by requiring the adversary to forge a signature on a random 
challenge message. In the case when the challenge messages is sampled uniformly 
at random, even though the leakage may reveal signatures for some messages, it 
is very unlikely that the adversary hits a forgery for the challenge message. 

Specifically, inspired by the work of Malkin et al. [E|, we propose a construc- 
tion that guarantees security in the presence of any polynomially hard-to-invert 
leakage, when the challenge message is chosen uniformly at random. The scheme 
uses the message as the CRS for a non- interactive zero-knowledge proof of knowl- 
edge (NIZKPoK). To sign, we use the CRS to prove knowledge of sk such that 
vk = -ff(sk), where H is a second preimage resistant hash function. Therefore, 
if an adversary forges a signature given vk and the leakage h(\t k, sk) with non- 
negligible probability, we can use this forgery to extract a preimage of vk which 
either contradicts the second preimage resistance of H or the assumption that h 
is polynomially hard-to-invert. An obvious drawback of this scheme is that prior 
to outputting a forgery for the challenge message the adversary only sees sig- 
natures on random messages. Finally, as a natural application of such schemes, 
we show that auxiliary input security for signatures carries over to auxiliary 
input security of identification schemes. Hence, our scheme can be readily used 
to build simple identification schemes with security against any polynomially 
hard-to-invert leakage function. 

Exponentially Hard-to-invert Leakage and Existential Unforgeability. 

The standard security notion for signature schemes is existential unforgeability 
under adaptive chosen-message attacks [E|. Here, one requires that an adver- 
sary cannot forge a signature of any message m, even when given access to a 
signing oracle. We strengthen this notion and additionally give the adversary 
leakage h(vk, sk), where h is some admissible function from class H. It is easy to 
verify that no signature scheme can satisfy this security notion when the only 
assumption that is made about h £ T-L, is that it is polynomially hard to com- 
pute sk given h(vk, sk). The reason for this is as follows. Since the secret key 
must be polynomially hard to compute even given some set of signatures (and 
the public key), a signature is an admissible leakage function with respect to H. 
Hence, a forgery is a valid leakage. This observation holds even when we define 
the hardness of h with respect to the public key as well. 

Our first observation towards constructing signatures with auxiliary input 
security is that the above issues do not necessarily arise when we consider the 
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more restricted class of functions that maintain (sub)-exponentially hardness of 
inversion. Suppose, for concreteness, that there exists a constant 1 > c > 0 such 
that there exists a probabilistic polynomial-time algorithm, taking as input a 
signature and the public key and outputting sk with probability p. Here, we 
assume that negl(fc) >})> 2~ k ° for some negligible function negl(-). Then, if we 
let H be the class of functions with hardness at least 2~ k , the signing algorithm 
is not in H. and hence the artificial counterexample from above does not work 
anymore! We instantiate this idea by adding an encryption C = Enc e k(sk) of the 
signing key sk to each signature. The encryption key ek is part of the verification 
key of the signature scheme, but the decryption key dk associated with ek is 
not part of the signing key. However, we set up the scheme such that dk can be 
guessed with probability p. Interestingly, it turns out that recent constructions 
of leakage resilient signatures pTHjUTTl , which originally were designed to protect 
against bounded leakage, use as part of the signature an encryption of the secret 
key. This enables us to prove that these schemes also enjoy security against 
exponentially hard-to-invert leakages. 

One may object that artificially adding an encryption of the secret key to the 
signature is somewhat counter-intuitive as it seems to reduce the security of the 
signature scheme. However, all that is needed for this trick is that guessing dk is 
significantly easier than guessing sk. For a given security level we can therefore 
pick the length of dk first, as to achieve that security level. After that we can 
then pick the length of sk as to achieve meaningful leakage bounds. Our concrete 
security analysis allows to choose these keys as to achieve a given security. Note, 
also, that adding trapdoors to cryptographic schemes for what superficially only 
seems to be proof reasons is common in the field - non-interactive zero-knowledge 
being another prominent example. 

For readers familiar with the security proof of the Katz-Vaikuntanathan 
scheme H3, we note that the crux of our new proof is that in the reduction 
we cannot generate a CRS together with its simulation trapdoor. Instead, to 
simulate signatures for chosen messages we will guess the simulation trapdoor. 
Fortunately, we can show that the loss from guessing the simulation trapdoor 
only effects the tightness in the reduction to the inversion hardness of the leakage 
functions. As we use a NIZK proof system with a short simulation trapdoor and 
only aim for exponential hard-to-invert leakage functions, we can successfully 
complete the reduction. 


Instantiation under the 2-Linear Assumption. As a concrete example, we 
show in the full version how to instantiate our generic transformation using 
the Groth-Sahai proofs system based on the 2-linear assumption. This yields 
security with respect to any 2~ 6k -hard-to-invert leakage. If we do not wish to 
define the hardness with respect to the public key as well, it is possible to guess 
it and thus loose an additional factor of 2~ 3k in the hardness assumption. Here, 
k! := log(p) for a prime p that denotes the order of the group for which the 
2-linear assumption holds, and the secret key of our scheme has length k \= t-k' 
bits for some constant fsN. 
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1.3 A Road Map 

In Section El we specify basic security definitions and our modeling for the aux- 
iliary input setting. In Section E3 we present our signature schemes for random 
messages (Section 13.211 and chosen massage attack security (Section 13.311 . In the 
full version we show how to use signatures on random messages to construct iden- 
tification schemes with security against any polynomially hard-to-invert leakage. 
We also show an instantiation of the later signature scheme under the 2-linear 
hardness assumption. 

2 Preliminaries 

Basic Notation. We denote the security parameter by k and by PPT probabilistic 
polynomial-time. For a set S we write x <— S to denote that x is sampled 
uniformly from S. We write y <— A(x) to indicate that y is the output of an 
algorithm A when running on input x. We denote by (a, b) the inner product of 
field elements a and b. We use negl(-) to denote a negligible function / : N — l 
R. and we use the ~ notation to denote computational indistinguishability of 
families of random variables. 

2.1 Public Key Encryption Schemes 

We introduce the notion of a labeled public key encryption scheme following the 
notation used in 0 . 

Definition 1 (LPKE). We say that PPT algorithms II = (KeyGen, Enc, Dec) 
is a labeled public key encryption scheme (LPKE) with perfect decryption if: 

— KeyGen, given a security parameter k, outputs keys (ek, dk), where ek is a 
public encryption key and dk is a secret decryption key. 

— Enc, given the public key ek, a label L and a plaintext message m, outputs a 
ciphertext c encrypting m. We denote this by c<— Enc L (ek,m). 

— Dec, given a label L, the secret key dk and a ciphertext c, with c Enc L (ek, 
m), then with probability 1 outputs m. We denote this by m<— Dec L (dk, c). 

Definition 2 (IND-LCCA secure encryption scheme). We say that a la- 
beled public key encryption scheme II = (KeyGen, Enc, Dec) is IND-LCCA secure 
encryption scheme if, for every admissible PPT adversary A = (Ai, A-f), there ex- 
ists a negligible function negl such that the probability IND-LCCAn,A{k ) that A wins 
the IND-LCCA game as defined below is at most IND-LCCA n ,A(k) < \ + negl(fc). 

— IND-LCCA game. 

(ek,dk) <- KeyGen(l fc ) 

(L, mo, mi, history) ec ^ dk ’^(ek), s.t. |mo| = |mi| 
c <- Enc i (ek, mb), where b <- {0, 1} 
b' <- )(dM (c, history) 

A wins if b' = b. 

An adversary is admissible if it does not query Dec^(dk, ■) with ( L,c ) 


Signature Schemes Secure against Hard-to-Invert Leakage 105 


In this work we require a weaker notion, called IND-WLCCA, where the adver- 
sary cannot query the decryption oracle with label L. Namely, we change the 
definition of admissible to mean that the adversary never queries Dec^(dk, •) 
with any input of the form (L, ■), where L is the label picked to compute the 
challenge. We discuss further details why this security notion is needed for our 
construction in Section 13,31 


2.2 Signature Schemes 

A signature scheme is a tuple of PPT algorithms E = (Gen, Sig, Ver) defined as 
follows. The key generation algorithm Gen, on input l k outputs a signing and a 
verification key (sk, vk). The signing algorithm Sig takes as input a message m 
and a signing key sk and outputs a signature a. The verification algorithm Ver, 
on input (vk, m, a), outputs either 0 or 1 (respectively rejecting or accepting the 
signature). A signature scheme has to satisfy the following correctness property: 
for any message to and keys (sk, vk) <- Gen(l fc ) 

Pr[Ver(vk, to, Sig(sk, to)) = 1] = 1 

The standard security notion for a signature scheme is existentially unforgeabil- 
ity under chosen message attacks. A scheme is said to be secure under this notion 
if, even after seeing signatures for chosen messages, no adversary can come up 
with a forgery for a new message. In this article, we extend this security notion 
and give the adversary additional auxiliary information about the signing key. 
To this end, we define a set of admissible leakage functions % and allow the 
adversary to obtain the value h( sk, vk) for any h G fi. Notice that by giving vk 
as input to the leakage function, we capture the fact that the choice of h may 
depend on vk. 

Definition 3 (Existential Unforgeability under Chosen Message and 
Auxiliary Input Attacks (EU-CMAA)). We say that a signature scheme 
E = (Gen, Sig, Ver) is existentially unforgeable against chosen message and auxiliary 
input attacks (EU-CMAA) with respect to % if for all PPT adversaries A and any 
function h £ R, the following probability Pr[CMA^ i- 4 i / ( (A:) = 1] is negligible in k, 
where CMA^ t A,h(k) is defined as follows: 

Experiment CMA^, a, h{k) Oracle 0(sk,m) 

(vk, sk) <- Gen(l fc ) Return (to, Sig(sk, m)) 

(to*, a*) <- A°( sk ’-)( l k , h(vk, sk), vk) 

If to* £ M return Ver(vk, to*, a*), else return 0. 

Where M is the set of messages submitted by A to the oracle. 

We note that the leakage may also depend on A’s signature queries as the func- 
tion h may internally run A, using the access to the secret key in order to emulate 
the entire security game, including the signature queries made by A. 
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As outlined in the introduction, we are also interested in a weaker security 
notion where the adversary is required to output a forgery for a random message 
after seeing signatures for random messages. To this end, we extend the definition 
from above and let the signing oracle reply with random messages, as well as 
pick the challenge message at random. This is formally described in the following 
definition. 


Definition 4 (Random Message Unforgeability under Random Mes- 
sage and Auxiliary Input Attacks (RU-RMAA)). We say that a signa- 
ture scheme E = (Gen, Sig, Ver) is random message unforgeable against random 
message and auxiliary input attacks (RU-RMAA) with respect to H if for all PPT 

adversaries A and any function h £ H, the probability PrfRMAx’^/^A;) = 1] is 
negligible in k, where RMAs,A,h(k) is defined as follows: 


Experiment RMA^^/^fc) 

(vk, sk) <— Gen(l fe ) ’ 

m* M., where M is the message space 
a* «- A°( sk )(l fc , h(vk, sk), vk, m*) 

Return Ver(vk, to*, a*). 


Oracle C(sk) 
to <— M. 

Return (to, Sig(sk, to)) 


We notice that this notion of security is useful in some settings. For instance, 
it suffices to construct 2-round identification schemes w.r.t auxiliary inputs. In 
the full version of this article m we propose formal definitions and a simple 
construction of an identification scheme with security in the presence of auxiliary 
input leakage. 

One way to enhance the security notion obtained by Definition 0 is to allow 
chosen message attacks, i.e., random message unforgeability under chosen mes- 
sage and auxiliary input attacks (RU-CMAA). In this game the adversary can 
pick the messages to be signed by itself but still need to forge a signature on a 
random message; see Section 16.21 for further discussion. 


2.3 Classes of Auxiliary Input Functions 

The above notions of security require to specify the set of admissible functions 
H. In the public key setting one can define two different types of classes of 
leakage functions. In the first class, we require that given the leakage /i(sk, vk) it 
is computationally hard to compute sk, while in the latter we require hardness of 
computing sk when additionally given the public key vk. We follow the work of 
Dodis et al. [ZJ to formally define this difference. Let in the following (sk, vk) •<— 
Gen(l fe ) be generated randomly. 

- Let Row(f(A;)) be the class of polynomial-time computable functions h : 
{0, l}l sk l+l vk l — >- {0, 1}* such that given h( sk, vk), no PPT adversary can find 
sk with probability £(k) > 2~ k , i.e., for any PPT adversary A: Pr ( ' sk v k)^Gen(i fc ) 
[sk <- A(h( sk, vk))] < £(k). 

— Let 'H v kow(f(fc)) be the class of polynomial-time computable functions h : 
(0, l}l sk l+l vk l — > {0,1}* such that given (vk, /i(sk, vk)), no PPT adversary 
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can find sk with probability £{k) > 2 k , i.e., for any PPT adversary A: 

Pr ( sk,vk)<-Gen(i*)[sk <- A(vk, h{ sk, vk))] < £{k). 

Security with respect to auxiliary input gets stronger if £{k) is larger. Our goal 
is typically to make £{k) as large as possible while still negl(fc). If a scheme is 
EU-CMAA for 77 v kow(^(fc)) according to Definitional we say for short that it is 
£(fc)-EU-CMAA. Similarly, if a scheme is RU-RMAA for 'H v kow(£(k)). l then we say 
that it is an 7(/c)- RU-RMAA signature scheme. If the class of admissible leakage 
functions is / H <m (£(k)), we will mention it explicitly. 

As outlined in the introduction, we typically prove security with respect to 
the class 'H v k OW (£(k)). The stronger security notion where hardness is required to 
hold only given the leakage, i.e., for the class of admissible functions 'H ow (£(k)), 
can be achieved by a relation between 'H ow (-) and 7Akow(-) proven by Dodis et 
al. 0. 

Lemma 1 (|7j). If |vk| = t(k) then for any i(k), we have 

1. WvkowWfc)) C n ow (m) 

2 . Uo^-^^k)) C U vk<m (£{k)) 

The first point of Lemma Q says that if no PPT adversary finds sk given (vk, 
h( sk, vk)) with probability £(k) or better, then no PPT adversary finds sk given 
only h( sk, vk) with probability £(k) or better. Clearly this is the case since know- 
ing vk will not make it harder to guess sk. The second point states that if no 
PPT adversary finds sk given h{ sk, vk) with probability 2 ~ t( ^£{k) or better, 
then any PPT adversary has advantage at most £{k) in guessing sk when given 
additionally vk. To see this consider a PPT adversary A that finds sk given 
(vk, h{ sk, vk)) with probability £'(k) > £{k). A then implies a PPT adversary B 
that given h( sk, vk) simply tries to guess vk and uses it to run A. Since B can 
guess vk with probability at least 2~ t( - k \ B has probability at least 2~ t ^£'(k) 
of finding sk. Thus contradicting h G / H 0 w(2~ t ^£(k)). 

3 Signature Schemes with Auxiliary Input Security 

3.1 A Warm-Up Construction 

In order to illustrate the difficulties encountered in designing cryptographic prim- 
itives in the auxiliary input setting we present a warm-up construction of a sig- 
nature scheme that may seem secure at first glance but, unfortunately, proving 
its security is impossible. Essentially, the problem arises due to the computa- 
tional hardness of the leakage and does not occur in other leakage models, where 
given the leakage the secret key is still information theoretically hidden. For 
ease of understanding, in this warm-up construction we only aim for the simpler 
one-time security notion on random messages, where the adversary only views a 
single signature before it outputs its forgery on a random message. We consider 
two building blocks for the following scheme: 
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1. A family 77 of second preimage resistant hash functions. 

2. A non-interactive zero-knowledge proof of knowledg^l (NIZKPoK) system 77 = 
(CRSGen, P, V) for proving knowledge of a secret value x so that y = H s (x) 
given s and y . We further require that the CRS’s of 77 are uniformly random 
strings of some length p(k) for security parameter k and some polynomial 
p(-). Denote the message space M. by {0, 

Informally, the signature scheme is built as follows. The signing key sk is a 
random element x in the domain of the hash function, whereas the verification 
key vk is y = H (x). The verification key vk also contains a common reference 
string crs for 77. A signature on a message to is the bit b = (m, sk) together with 
a non-interactive proof with respect to crs proving that b was computed as the 
inner product of the preimage of y and the message m. More precisely, define 
the signature scheme £ = (Gen^, Sigj;, Ver^) as follows: 

Key Generation, Gen^(l fc ): Sample a second preimage resistant hash func- 
tion H s from 77, a random element x in the domain of 77 s and crs <— 
CRSGen(l fc ). Output sk = x, vk = (H(x), crs). 

Signing, Sig^sk, m): Parse vk as (77(sk), crs). Compute b = (to, sk). Use the crs 
to generate a non-interactive zero-knowledge proof of knowledge 7r, demon- 
strating that b = (TO,sk) and 77 (sk) = y. Output a = (b, n). 

Verifying, Verj;(vk, to, a): Parse vk as (77(sk),crs) and a as (b,n). Use crs to 
verify the proof n. Output 1 if the proof is verified correctly and 0 otherwise. 

We continue with an attempt to prove security. Note first that by the properties 
of 77, the ability to generate a forgery ( a',m ') reduces to the ability using the 
extraction trapdoor to either find a second preimage for the hash function or 
break the hardness assumption of the leakage function. As the difficulties arise in 
the reduction to the hardness of the leakage function, we focus in this outline on 
that part. Assume there is an adversary A attacking signature scheme £ given 
auxiliary input leakage h(sk, vk) and (y, crs). Then, an attempt to construct B 
that breaks the hardness assumption of the leakage function by invoking A works 
as follows. B obtains (y, crs) and the leakage h( sk, vk) from its challenge oracle. 
It forwards them to A who will ask for signature query. Unfortunately, at that 
point we are not able to answer this query as we cannot simulate a proof without 
knowing the witness or the trapdoor. 

An alternative approach may be to directly prove security with respect to 
the leakage class 'H ovl (£(k)) and let B sample the CRS herself using the zero- 
knowledge simulator to know a trapdoor. Unfortunately, also this approach is 
deemed to fail as in this case there is no way to learn a y = 77 (sk) that is con- 
sistent with the leakage. Moreover this results into several difficulties in defining 
the set of admissible leakage functions as they must be different now for A and 
B. This can be illustrated as follows. Suppose that the CRS is a public key for 
an encryption scheme and the trapdoor is the corresponding secret key. As A 
only knows the CRS but not the trapdoor a leakage function h that outputs 

For definition of NIZKPoK we refer to the full version of this article m 
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an encryption of sk = x is admissible. On the other hand, however, for B who 
knows the trapdoor (hence the secret key of the encryption scheme) such leakage 
cannot be admissible. This shows that we need to consider different approaches 
when analyzing the security of digital signature schemes in the presence of aux- 
iliary input. In what follows, we demonstrate two different approaches for such 
constructions, obtaining two different notions of security. 

3.2 A RU-RMAA Signature Scheme 

In this section we present our construction of a RU-RMAA signature scheme as 
defined in Definition 0 For this scheme we assume the following building blocks: 

1. A family 77 of second preimage resistant hash functions with input length 
ki and key sampling algorithm Gen//. 

2. A (NIZKPoK) system 77 = (CRSGen, P, V) for proving knowledge of a secret 
value x so that y = H a {x) given s and y . We further require that the CRS’s 
of 77 are uniformly random strings of some length p(k) for security parameter 
k and some polynomial p(-). Denote the message space M. by {0, l} p ( fc ). 

The main idea for the scheme is inspired by the work of Malkin et al. |1 9j where 
we view each message m as a common reference string for the proof system 77. 
Since m is uniformly generated, we are guaranteed that the CRS is generated 
correctly and knowledge soundness holds. Intuitively since each new message 
induces a new CRS, each proof is given with respect to an independent CRS. This 
implies that in the security proof the simulator (playing the role of the signer) 
can use the trapdoor of the CRS that corresponds to the challenge message to * . 
We formally define our scheme £ = (Gen,Sig, Ver) as follows. 

Key Generation, Gen(l fc ): Sample s <— Gen//(l fe ). Sample x <— {0, l} fcl and 
compute y = H s (x). Output sk = ( x , s) and vk = (y, s). 

Signing, Sig(sk, to): To sign m <— M., let crs = m and sample the signature 
a <— P(crs, vk, sk) as a proof of knowledge of x such that y = H s (x). 
Verifying, Ver(vk, m, a): To verify a on m = crs, output V(crs, vk, a). 

Theorem 1. Assume that 77 is a second preimage resistant family of hash func- 
tions and 77 = (CRSGen, P,V) is a NIZKPoK system. Then £ = (Gen, Sig, Ver) 
is a neg\(k) -RU-RMAA signature scheme. 

The intuition of the proof is that if one can efficiently forge a signature on a 
random m* after getting signatures on random messages to, then one can also 
efficiently compute x , contradicting the assumption that the leakage is hard to 
efficiently invert. During the simulated attack the signatures on random messages 
to are simulated by sampling to = crs, where crs is sampled along with the 
simulation trapdoor. In the end one samples m* = crs, where crs is sampled along 
with the extraction trapdoor. Upon getting a forgery on to* , we can extract x 
using the extraction trapdoor. 
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In the standard setting, a simple modification using Chameleon hash func- 
tions [El enables to achieve a stronger notion of security. Recall first that 
Chameleon hash functions are collision resistance hash functions such that given 
a trapdoor one can efficiently find collisions for every given preimage and its 
hashed value. Thereby, instead of signing random messages the scheme can be 
modified so that the signer signs the hashed value of the message. This achieves 
chosen message attacks security so that the adversary picks the messages to be 
signed during the security game, yet the challenge is still picked at random. Nev- 
ertheless, when introducing hard-to-invert leakage into the system this approach 
does not enable to obtain security against polynomially hard-to-invert leakage, 
because we run into the same problem specified in Section Id. 11 Moreover, in 
Section 13.31 we show how to obtain the strongest security notion of existential 
unforgeability under chosen message and auxiliary input attacks. 

Proof. Let Exp^- ^ ^ be as defined in Definition 0 | for PPT adversary A and 
leakage function h £ 7f v kow(negl(/c)). Furthermore let W be the event that A 
wins the game. We show that Pr[W] is negligible. Denote this probability by po- 
Consider the following modification to Exp E A h(k). 

1 . Generate (vk,sk) as in Exp^^^fc). 

2 . Instead of sampling the challenge to* as m* <- M sample (to', td e ) <- Ex (l fc ) 
and let to* = m ! , where E = (Ei,Ef) is the knowledge extractor for 17 . 

3 . Give input to A as in Exp s,A,h(^)- 

4. To answer the oracle queries of A, sample (m! , td s ) «— Si (l fc ), let m = m' and 
return the signature (to, ^(m, vk, td s )), where S = (Si, S2) is the simulator 
for n. 

5 . Receive a forgery a* from A as in Exp^. ^ h (fe). 

6. Output as in Exp 

Let pi be the probability that the modified experiment above outputs 1 . Also 
consider x' = E-iirri * , vk, td e , a*). I.e. x' is a signing key extracted from M’s 
forgery. By 17 being a NIZKPoK we have that distributions of messages and sig- 
natures in the modified experiment are indistinguishable from the distributions 
in the original experiment Exp^ h {k). Thus it follows that pi is negligibly close 
to po. Let P2 be the probability that H s (x') = y. By the knowledge soundness 
of II it follows that P2 is negligibly close to po- 

Note then that, since S and E are both PPT algorithms, the modified exper- 
iment describes a PPT algorithm which computes x' where with probability 74 
it holds that y = H s (x'). Let 773 be the probability that y = H s (x r ) and x' 7^ x 
and let 74 be the probability that x' = x. Note that P2 = P:i + Pi- 

The Event X. Consider the PPT algorithm B that given vk and leakage 
h( sk, vk), where (sk, vk) <- Gen(l fc ), runs steps 121131 of the modified experiment 
above and outputs x* = ^(to*, vk, td e ,(7*). Denote by X the event in which B 
outputs x* = x. Since (vk, sk) is generated as in Exp E A h (k) Pr[X] > 7x4 . Thus 
by definition of 7f v kow(negl(/c)), 74 is negligible. 
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The Event C. On the other hand, consider the PPT algorithm B that is given 
s,x and y = H s (x). B lets vk = (y, s) and runs steps EE of the modified ex- 
periment above (notice that B is given x, so it can compute the leakage h ) and 
outputs x* = E 2 (m*. vk. td e ,a*). Denote by C the event in which B outputs 
x* ^ x so that H s (x * ) = H s (x). Notice again that {{H a ,y),x) are generated as 
in Exp^. ^ h {k) and therefore Pr[C] > Pa . Thus by the second preimage resistance 
hardness of the family 77, p 3 is negligible. 

This implies that p 3 and Pi are negligible and so is p 2 = Pa + Pi- Since 
Po is negligibly close to p 2 , Po must also be negligible. By definition po = 
PrfExpj; A h (k) = 1] and so by Definition 0 E is a negl (fc)-RU-RM A A signa- 
ture scheme. □ 

Notice that in the above we assume that the CRS of the NIZKPoK 77 is a 
uniformly random bit string. As an example of a NIZKPoK with this property 
we can use the construction of j22f ■ In their construction the CRS is a pair (ek, r) 
where r is a random string and ek is an encryption key for some semantically 
secure public-key encryption scheme. Thus, we can use the construction of ESI 
with a public-key encryption scheme where uniformly random bit strings can act 
as public-keys, like Regev’s LWE scheme EH- 


3.3 A EU-CMAA Signature Scheme 

In this section we build a EU-CMAA signature scheme. We use k to denote the 
security parameter. We need the following tools: 

1. A family of second preimage resistant hash functions 77 with key sampling 
algorithm Gen^y, where the input length can be set to be any = poly(fc) 
and where the length of the randomness used by s <— Gen n{l k ) is some 
l\ = poly(fc) independent of ki and where the length of an output y = H s (x) 
is some U = poly (A;) independent of k^. I.e., it is possible to increase the input 
length of H s without increasing the randomness used to generate s or the 
output length. 

2. An IND-WLCCA secure labeled public-key encryption scheme r = (KeyGen, 
Enc, Dec) with perfect decryption (cf. Definition EJ) , where the length of dk 
is some l 2 = poly(fc) independent of the length of the messages that r can 
encrypt. 

3. A reusable-CRS non-interactive zero-knowledge proo{j| system (NIZK) 77 = 
(CRSGen, P, V), where the length of the simulation trapdoor td s at security 
level k is some l , a = poly(fc) independent of the size of the proofs that the 
NIZK can handle. 

The IND-WLCCA secure encryption scheme might be replaced by a IND-CPA 
secure scheme, but at the price of then instead using a simulation sound NIZK: 
We expect a general proof via true simulation extractability to work along the 

3 For definition of reusable-CRS NIZK we refer to the full version of this article m 
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lines of jHJ. We chose the above tools as they lean themeselves nicely towards 
our concrete instantiation. 

The reason why we use IND-WLCCA is that our signature scheme requires 
to encrypt its secret key that is much longer than the decryption key. For that 
we need to break the secret key into blocks and encrypt each block separately 
under the same label (looking ahead, the label would be the signed message). 
Note that labeled public-key encryption schemes for arbitrary length massages is 
not implied by LCCA secure scheme for fixed length messages. This is because 
the adversary can change the order of the ciphertexts within a specific set of 
ciphertexts and ask for a decryption. We therefore work with the weaker notion 
that is sufficient for our purposes to design secure signature schemes, and is 
easier to instantiate as demonstrated in the full version of this article B2- 
Our scheme E works as follows: 

Key Generation, Gen(l fe ): Sample s <— Genjy(l fc ) and (ek,dk) <— KeyGen(l fc ). 
Furthermore, sample (crs,td s ) <— Sj (l fc ) and x <— {0, l} fc4 , where S = 
(j^i , ^ 2 ) is the simulator for 77 |j Compute y = H s {x). Set (sk, vk) = (x, ( y , s, 
ek, crs)). 

Signing, Sig(sk, m): Compute C = Enc m (ek, x). Using crs and 77, generate a 
NIZK proof 7r proving that 3 x{C = Enc m (ek,a:) A y = H s (x)). Output cr = 
(C,n). 

Verifying, Ver(vk, m, a): Parse a as G, n. Use crs and V to verify the NIZK 
proof 7 r. Output 1 if the proof verifies correctly and 0 otherwise. 

As explained in jH|, a NIZK proof system together with a CCA-secure encryp- 
tion scheme are a specific instantiation of true- simulation extractable (tSE). An 
alternative instantiation would be to compose a simulation-sound NIZK with a 
CPA-secure encryption scheme. This approach was used in fT7j . We note that 
our proof follows similarly for this instantiation as well. 

Theorem 2 . If 77, E = (KeyGen, Enc, Dec) and 77 = (CRSGen,P,V) have the 
properties listed above, then E is 2~ kb -EU-CMAA where k$ = k + 12 + I3 and 
where 

— k is the security parameter of E, 

— h is the length of the randomness used to sample s at security parameter k\ 
for 77, 

— I 2 is the length of the decryption key 6k at security parameter k 2 for r, 

— I3 is the length of the simulation trapdoor td s at security parameter k% for 

n, 

If we consider the class 'H ovl (£(k)), then our scheme is 2~ ke -EU-CMAA where 
ke = k + h + I2 + h + U and where I4, is the length of y = H s (x) at security 
parameter k\ for 77. 

4 It is deliberate that we use a simulated CRS as part of the public key. This makes 
the set of admissible leakage functions defined relative to a simulated CRS, which 
we use in the proof. The scheme might be secure for a normal CRS too, but the 
proof would be more complicated. 
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Specifically, the best success against E in the forging game with 2 -hard leakage 
by a PPT adversary A is 2~ k + o e * + U£ ^> where u is a polynomial and 

— sq and £3 are the advantages of some PPT adversaries in the ZK game 
against II at security parameter k^, 

— £1 is the success probability of some PPT adversary in the soundness game 
against II at security parameter ks, 

— £2 is the probability that some PPT adversary wins the second preimage game 
against H on security parameter ki and x <— {0, l} fe4 , 

— £4 is the advantage of some PPT adversary in the IND-WLCCA game 
against r at security parameter k 2 . 

The intuition behind the proof of security is that a forged signature contains an 
encryption of the secret key x, so forging leads to extracting x using dk, giving 
a reduction to the assumption that it is hard to compute x given the leakage. 
In doing this reduction the signing oracle is simulated by encrypting 0 fc4 and 
simulating the proofs using the simulation trapdoor td, s . This will clearly still 
lead to an extraction of x, using reusable-CRS NIZK and IND-WLCCA. The 
only hurdle is that given (vk, h(sk, vk)), we do not know dk or td s . We can, 
however, guess these with probability 2~ l2 respectively 2~ ls . This is why we 
only get security few = k + I2 + h- When we prove security for 'H 0VJ (f(k)) the 
reduction is not given vk either, so we additionally have to guess s and y, leading 
to k§ = k + 1\ + 1% + 13 + I4.. 

If we set ki = k + 12 + 13 + h + L , then the min-entropy of x given y = H s (x) 
is k + 12 + 13 + L, so leaking L bits would be an admissible leakage in the 2 _fcw 
security game. Since, by assumption on our primitives, I2 and I3 and I4 does 
not grow with ki, it follows that we can set L to be any polynomial and be 
secure against leaking any fraction (1 — k~°^) of the secret key. Due to space 
constraints the complete proof is found in the full version m 

The following is a corollary to Thm. [21 

Theorem 3 . If H , T and TI have the properties listed above, then E is 2~ kw - 
EU-CMAA where few = k + I2 + I3 and l\ is the length of the randomness used 
to sample s, I2 is the length of the decryption key dk for r, I3 is the length of the 
simulation trapdoor td s . In particular, E is 2~ kw -EU-CMAA for kw = poly(fc) 
which do not grow with ki, i.e, the input length of the hash function. 

If we consider the class Ho^^k)), then E is 2~ ks -EU-CMAA where ks = 
k + 1\ + 12 + I3 + I4 and where I4 is the length of y = H s (x). 

Our concrete instantiation has all the needed properties, except that s has a 
length which depends on k 4. This, however, can be handled generically as follows. 

Lemma 2 . If there exists an e-secure family of second preimage resistant hash 
functions H, with key sampling algorithm Gen#, and a 6-secure pseudo-random 
generator prg, then there exists an (e+S) -secure family of second preimage resis- 
tant hash function H, with key sampling algorithm Ger/ H , where s <— Gen^(l fe ) 
can be guessed with probability 2~ k °, where ko = poly(fc) is the seed length of prg 
at security level k. 
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Proof. Let Gen / /f (l fe ; r e {0, l} fc °) = Gen ff (l fc ; prg(r)). It is clear that an output 
of Gen^ (r £ {0, l} fe °) can be guessed with probability 2~ k °, by guessing r. Let 

£ = P r ;c*«— _4 ( s,x) A x«— { 0,l} fe 4As<— Genjj [H s (x*) = H s (x) A X* ± x\ 

, and let s' = Pr x » < _^( SjX ) Ax ^_{ 0)1 }fc 4 As<-Gen' Jf [H a (x*) = H s (x)Ax* ^ a:]. Consider 
the algorithm B(s) which samples x 4— {0, l} ki and x* 4— A(s, x) and outputs 
1 iff H s (x*) = H s (x). This algorithm is PPT, and e' = Pr[B(Gen#(prg(r 4— 
{0, l} feo ))) = 1] and e = Pr[S(Gen#(r 4— {0,1}*)) = 1]. By the prg being a 
5-pseudo-random generator, it follows that \e' — e\ < S. □ 

Remark. We can also prove security in the stronger model, where the leakage 
function h sees not only sk, but the randomness used by Gen to generate (vk, sk). 
In that case we need that the distribution on ek induced by sampling (ek, dk) 
with KeyGen r ,the distribution of a crs sampled along with a trapdoor and that 
the distribution on s induced by sampling s i— Gen# can all be sampled with 
invertible sampling. This is indeed the case for our concrete instantiation. The 
only problematic point is Lemma |3 Even if Gen/-/({0, 1}*) has invertible sam- 
pling, it would be very surprising if Gen# (prg({0, l} fc °)) has invertible sampling. 
So, if the probability of guessing a random s 4— Gen h is not independent of the 
input of H s , we cannot generically add this property. One can circumvent this 
problem as in jOj and consider s as a public parameter of the scheme. This is 
modeled by sampling s in a parameter generation phase prior to the key gen- 
eration phase and give s as input to all entities. This would in turn make s an 
input to the reduction (called B 7 in the appendix), circumventing the problem 
of having to guess s. We would get security when considering the class 
for ks = k + I 2 + Z 3 + ^4- 
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Abstract. Understanding the minimal assumptions required for carry- 
ing out cryptographic tasks is one of the fundamental goals of theoretical 
cryptography. A rich body of work has been dedicated to understanding 
the complexity of cryptographic tasks in the context of (semi-honest) se- 
cure two-party computation. Much of this work has focused on the char- 
acterization of trivial and complete functionalities (resp., functionalities 
that can be securely implemented unconditionally, and functionalities 
that can be used to securely compute all functionalities). 

All previous works define reductions via an ideal implementation of 
the functionality; i.e., / reduces to g if one can implement / using an 
ideal box (or oracle) that computes the function g and returns the out- 
put to both parties. Such a reduction models the computation of / as 
an atomic operation. However, in the real-world, protocols proceed in 
rounds, and the output is not learned by the parties simultaneously. In 
this paper we show that this distinction is significant. Specifically, we 
show that there exist symmetric functionalities (where both parties re- 
ceive the same outcome), that are neither trivial nor complete under 
“ideal-box reductions”, and yet the existence of a constant-round pro- 
tocol for securely computing such a functionality implies infinitely-often 
oblivious transfer (meaning that it is secure for infinitely-many n’s). In 
light of the above, we propose an alternative definitional infrastructure 
for studying the triviality and completeness of functionalities. 

1 Introduction 

Secure Computation and Completeness. In the setting of secure two-party 
computation, two parties with respective private inputs x and y. wish to compute 
a function / of their inputs. The computation should preserve a number of 
security properties, like privacy (meaning that nothing but the specified output 
is learned), correctness and more. 

In the late 1980s, it was shown that every function can be securely computed 
in the presence of semi- honest and malicious adversaries, assuming the existence 
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of enhanced trapdoor permutations J1 8l(ij . Soon after, it was shown that any 
function can be securely computed, given an ideal box for computing the obliv- 
ious transfer function This work demonstrated that there exist “complete” 
functions for secure computation; that is, functions that can be used to securely 
compute all other functions. Such functions are of great interest. On the one 
hand, when attempting to base secure computation on weaker hardness assump- 
tions, it suffices to construct a secure protocol for a complete function based 
on some weaker assumption, since it will imply that this assumption suffices for 
securely computing all functions. On the other hand, it is immediate that a com- 
plete function is the “hardest” to compute, at least with respect to the minimum 
hardness assumption. Due to the above, much research has been carried out in 
an attempt to classify functions as complete or not, and as trivial or not (where 
triviality means that it can be securely computed without any assumption). 

The Complexity of Secure Computation. Currently, we have a good picture 
regarding the complexity of secure computation, through the aforementioned 
research of completeness. For example, we know that in the setting of asymmetric 
functionalities (where only one of the two parties receives output), every two- 
party (deterministic) asymmetric function is either complete or trivial nun- 
Thus, no non-trivial asymmetric function can be securely computed under an 
assumption weaker than that needed for securely computing oblivious transfer. 

However, in the setting of symmetric functionalities, where both parties re- 
ceive the same output, the picture is more complex |1 011 311 5j| . For example, 
unlike the asymmetric setting, there exist (deterministic) symmetric functions 
that are neither complete nor trivial; see Figure Q below for an example of such 
a function. This begs the following fundamental question: 

What hardness assumptions are sufficient and necessary for securely 

computing functions that are neither complete nor trivial? 

The starting point of this work is the above question. We stress that although 
Kilian m separated these functions from all complete functions, hinting that 
it may be possible to devise secure protocols for such functions relying on as- 
sumptions that are strictly weaker than those needed for oblivious transfer, the 
only known protocols for securely computing non-trivial functions are general 
protocols that rely on hardness assumptions that can be used to compute any 
function including oblivious transfer. 

Black-Box Reductions and Black-Box Separations. As we have men- 
tioned, a large body of work has been dedicated to understanding the complexity 
of cryptographic tasks in the context of (semi-honest) secure two-party compu- 
tation (see, e.g., |1 1911 ( H I 1217115] h The idea underlying much of this work is 
that if the possibility to securely compute a functionality f\ implies the possi- 
bility to securely compute a functionality fy, then f\ is at least as hard as fy. 
It is then said that fy reduces to f i . A functionality / is called complete if all 
other functionalities reduce to /. The question of how to define the notion of 
reduction is of great importance to the implication of these results. 
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All previous works define a reduction via an ideal implementation of a func- 
tion; i.e., f-2 reduces to /i if a secure protocol for computing f 2 can be constructed 
given an ideal box (trusted party or oracle) that computes /1 and gives the out- 
puts to both parties simultaneously^ The advantage of (black-box) reductions 
of the above type is that they provide a constructive way of securely computing 
one functionality given an implementation of another. However, the disadvan- 
tage of black-box reductions is that a separation (i.e., a proof that one function 
does not reduce to another) does not necessary imply that one cannot construct 
a secure protocol for one function given a secure protocol from the other. This 
is due to the fact that a reduction may be nonblack-box. 

Our Contributions. In this work we give substantial evidence that the pic- 
ture of computational hardness of securely computing two-party functionalities 
in the presence of semi- honest adversaries is different to that drawn by the char- 
acterizations of completeness of | l Oi l dj . Specifically, we show that there exist 
symmetric functionalities / (i.e., where both parties get the same output), that 
are not ideal-box- complete (i.e., OT cannot be implemented using an ideal-box 
computing /) but may be in some sense as hard to obtain as OT. Specifically, 
we prove the following: 

Theorem 1.1 (informal). If there exists a constant round protocol n that se- 
curely computes a symmetric non-trivial functionality f over a constant-size do- 
main, in the presence of semi-honest adversaries, then there exists an infinitely- 
often-OT that is secure in the presence of semi-honest uniform adversaries^ 

Needless to say, Theorem I I .11 is of interest for functionalities / that are not 
complete; as we have mentioned, such functionalities exist. 

Our main observation in proving this result is that in real-world protocols, an 
ideal-box that simultaneously provides outputs to both parties does not exist. 
Rather, parties learn their outputs gradually, and hence, in any constant-round 
protocol, there must be a round in which one party learns substantial information 
before the other party does. Thus, essentially there is no difference between the 
symmetric setting (where both parties receive output and there are functions 
that are neither complete nor trivial) and the asymmetric setting (where only 
one party receives output and all functions are either trivial or complete). 
Alternative Formulation of Completeness — Existential Completeness. 
In light of the above, we propose a new definition of completeness that is not 
black box. We define the notion of an “achievable class” of a given functional- 
ity /. Informally speaking, the achievable class of a functionality / contains all 

1 We stress that the issue of simultaneity has nothing to do with fairness since we 
consider semi-honest adversaries. Rather, the important point is that both parties 
receive the same information and it is not possible for one party to learn the output 
of the function while the other does not. If this were not the case, and only one party 
receives output then the symmetric setting reduces to the asymmetric setting where 
all functionalities are either trivial or complete. 

2 Infinitely-often-OT is a protocol for computing OT for which correctness and security 
hold for infinitely many n’s (rather than for all sufficiently large n). 


Completeness for Symmetric Two-Party Functionalities - Revisited 119 


functionalities that can be securely computed, assuming that / can be securely 
computed. We use this notion in the natural way in order to redefine reductions, 
and trivial and complete functionalities. Our formulation has the disadvantage 
of being completely non-constructive. However, it has the advantage of providing 
a more accurate picture regarding the hardness assumptions required for secure 
computation. 

Related Work. As we have already mentioned, completeness in secure two-party 
computation was investigated in a large body of work |2lldll()llll2ll4lllll5lldl7| . 
We discuss a few that are more relevant to our discussion. Kilian m and Kushile- 
vitz m consider the symmetric model and give criteria for the existence of uncon- 
ditionally secure protocols [131 an d for completeness [TQj . Maji et al. extended 
the discussion of the symmetric model to the UC-setting. Beimel, Malkin and Mi- 
cali Q considered the asymmetric model. They prove a zero-one law for complete- 
ness vs. triviality in this model. Almost all of these works consider functions with a 
constant size domain and information-theoretic security. The only exception is [Jj 
who deals with computational security in the asymmetric model. 

2 Definitions 

2.1 Preliminaries 

A function p : N — > N is negligible if for every positive polynomial p(-) and all 
sufficiently large n it holds that p(n) < We use the abbreviation PPT to 
denote probabilistic polynomial-time. For an integer Z, define [£] = {1 . A 

probability ensemble X = {X (a, «)} ae {o,i}* ; rieN * s an infinite sequence of random 
variables indexed by a and n. (The value a will represent the parties’ inputs and 
n the security parameter. All polynomials that we will consider will be with 
respect to the security parameter, unless explicitly stated otherwise; specifically, 
all polynomial time machines will be polynomial in the security parameter.) We 
let A denote the empty word. 

Two ensembles X = {X (a, ™)} a6 {o,i}* ;n eN an d ^ = {T ( a i n )}oe{o,i}*;neN are 
computationally indistinguishable, denoted X = Y, if for every family {CAjngN 
of polynomial-size circuits, there exists a negligible function p (•) such that for 
every a e {0, 1}* and every n G N, 

| Pr [C n {X (a, n)) = 1] - Pr [C n (Y (a, n)) m l|j < p (n ) . 

The ensembles X and Y are computationally indistinguishable by uniform machines, 
denoted X=uY , if the above holds for every PPT distinguisher D. 


2.2 Secure Two-Party Computation and Oblivious Transfer 

We follow the standard definition of secure two party computation for semi- 
honest adversaries, as it appears in [3j. In brief, a two-party protocol ir is de- 
fined by two interactive probabilistic polynomial-time Turing machines A and B. 
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The two Turing machines, called parties, have the security parameter 1" as their 
joint input and have private inputs, denoted x and y for A and B, respectively. 
The computation proceeds in rounds. In each round of the protocol, one of the 
parties is active and the other party is idle. If party P G { A , B} is active in round 
i, then in this round P writes some value Outp on its output tape, and sends 
a message m* to the other party. Without loss of generality, we assume that A 
is always active in the odd rounds in n and B in the even rounds. The number 
of rounds in the protocol is expressed as some function r(n) in the security 
parameter (where r(n) is bounded by a polynomial). 

The view of a party in an execution of the protocol contains its private input, 
its random string, and the messages it received throughout this execution. The 
random variable ViewS(x, y, 1") (respectively View'S (x. y, 1")) describes the view 
of A (resp. B) when executing vr on inputs (x,y) (with security parameter «) . 
The output of an execution of tt on (x, y) (with security parameter n) is the pair 
of values written on the output tapes of the parties when the protocol execution 
terminates. This pair is described by the random variable Output 77 (x, y. 1") = 
(Output S (x, y , 1”) , Output^ (x, y, 1")), where Output J> (x, y, 1”) is the output 
of party P G { A , B} in this execution, and is implicit in the view of P. 

In this work, we consider deterministic functionalities over a finite domain. We 
therefore provide the definition of security only for deterministic functionalities; 
see H3 for a motivating discussion regarding the definition. 

Definition 2.1 (security for deterministic functionalities). A protocol tt = 
( A,B ) securely computes a deterministic functionality f = (/a, /b) in the pres- 
ence of semi-honest adversaries if the following hold: 

Correctness: There exists a negligible function p(-), such that for every n and 
every pair of inputs x, y, it holds that 

Pr [Output 77 (x, y, 1") = f(x,y)] > 1 - p(n). (1) 

where the probability is taken over the random coins of the parties. 

Privacy: There exist two probabilistic polynomial-time (in the security param- 
eter) algorithms S a, Sb (called “simulators”), such that: 

{5a (x, f a(x, y), l”)} x , 2/e{0 ,i}*;r t GN = {View^ (x, y, l")} Xi1/e{ o,i}*;neN » ( 2 ) 
{Sb (yjB(x,y),l n )} XiVe{0Ar . neN = {View£ {x,y, l")}» lV 6 { o,i}* in 6N ■ ( 3 ) 

For most of this paper, we will consider functionalities where both parties receive 
the same output, meaning that /a = fn- We call such functions symmetric and 
we denote by f(x,y) the output that both parties receive. We will also only 
consider the semi-honest model here, and therefore omit this qualification from 
hereon. 

Oblivious Transfer — Naive-OT Variant. The oblivious transfer functional- 
ity (OT) is one of the most important cryptographic primitives and is known to be 
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complete for general two-party computation j 1 Dltij . There are several equivalent 
versions of OT; the most common being Rabin-OT [d and l-out-of-2 OT 0. In 
this paper we use a slightly different version presented in 0 , called Naive-OT, de- 
fined by the functionality OT(6, c) = | ^ ^ > meaning that the sender 

never learns anything (recall that A is the empty string) , and the receiver learns 
the sender’s bit b if its choice-bit c equals 1, but does not learn anything if c = 0. 
This is the same as Rabin-OT except that the receiver chooses whether or not to 
receive the sender’s bit b. In the semi-honest model it is equivalent to Rabin-OT 
(and to l-out-of-2-OT). 

2.3 Uniform Infinitely-Often Security 

Our main result is a proof that the existence of a constant-round protocol for 
functionalities that are neither complete nor trivial almost implies oblivious 
transfer. The “almost” in this sentence is due to the fact that we can only 
prove that it implies oblivious transfer that is secure for infinitely many n’s, 
in contrast to all sufficiently large n’s. In addition, we can only prove that the 
oblivious transfer is secure in the presence of uniform distinguishes. We therefore 
need to define this weaker notion of Security- 

Definition 2.2 (uniform infinitely-often security). A protocol i r securely 
computes a deterministic functionality f in the presence of semi-honest adver- 
saries with uniform infinitely-often security if there exists an infinite subset Af C N 
such that Equations 0), 0) and 0) hold for every n G Af, and Equations 0) 
and 0 ) hold with respect to uniform distinguishes. 

We stress that the correctness and privacy conditions must all hold for every n G 
Af (it does not suffice to require infinitely many n’s for which each requirement 
holds since it is possible that they may hold for different n’s in which case the 
function will be trivial). 

3 Our Main Technical Result 

In this section, we prove Theorem 11.11 In order to formally state the theorem 
and our result, we first need to define the class of functions that we consider. 
We therefore begin with preliminaries. 

3.1 Preliminaries 

Our theorem applies to all non-trivial functionalities, as characterized by Kushile- 
vitz m- This characterization uses the notion of “decomposition” of a function. 
We now define this notion. 

Definition 3.1 (equivalence relation = over inputs). LetX, Y,ZC {0, 1}*, 
and let f : X x Y — >• Z. Two inputs x\,x 2 G X existentially coincide, denoted 
x\ ~ £2, if there exists an input y GY such that f(xi,y) = f{x2,y). We define 
an equivalence relation = over X to be the transitive closure of the relation ~ 
over all x £ X. The relations ~ and = are defined over Y similarly. 
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Definition 3.2 (strongly non-decomposable functions). A function g : 
X xY Z is strongly non-decomposable if it is not monochromatic, all x £ X 
are equivalent, and all y GY are equivalent. 

We refer to H3 in order to see why this is called non-decomposable. The binary 
OR and AND functions are strongly non-decomposable, as is the function fxusn 
defined below: 


Sill 0 0 1 
X2 3 4 1 

a* TIT 


Fig. 1. Kushilevitz’s function ficusn 

A strongly non-decomposable function has the property that all inputs are 
equivalent. We now define a non-decomposable function simply to be a function 
which has a subfunction that is strongly non-decomposable. 

Definition 3.3 (non-decomposable functions). A symmetric function f : 
X xY — >• Z is non-decomposable if there exist X' C X and Y' CF such that f 
restricted to X' and Y' is strongly non-decomposable; else it is decomposable. 

We remark that Kushilevitz m proved that a function is non-trivial if and only 
if it is non-decomposable. The function ficusn is of particular interest since it 
is neither trivial (as shown by m nor complete (as shown by m- 

3.2 The Theorem and Proof 

Let / be a symmetric non-decomposable functionality with domain of constant 
size. We show that the existence of a constant-round protocol for computing f 
implies the existence of a weak variant of oblivious transfer. The idea behind the 
proof is to run a protocol n for f until the first round in which one of the parties 
learns meaningful information about the input of the other party. Since this is the 
first round that something is learned and only one party can learn information in 
any single round, we have that one party has learned something and the other has 
not. This asymmetry of information suffices for us to construct oblivious transfer. 

Our proof proceeds in three stages. First, we prove that a round as described 
above exists. Intuitively, this is the case since before the protocol execution 
neither party has any information about the other party’s input, but at the end 
of the execution each party learns significant information about the other party’s 
input. Next, we show that a weak form of oblivious transfer can be constructed 
from any protocol with such a round (in actuality, we need to prove that such 
a round exists on a special subset of inputs called a minor, and we demonstrate 
this in the first step). The OT that we construct is weak in the sense that it is 
only correct with noticeable probability. Finally, we show how to boost the weak 
correctness of the OT to fully correct oblivious transfer. 
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We stress that we do not actually obtain a full oblivious transfer protocol. 
Rather, our protocol is only secure infinitely often-, see Definition 12. 21 We explain 
why this is the case at the end of Section EH 

Theorem 3.1. If there exists a constant round protocol n that securely computes 
a symmetric, deterministic, non-decomposable functionality f ( over a constant- 
size domain), then there exists a uniform infinitely- often OT protocol. 

Proof: Recall that a non-decomposable functionality is a function with a sub- 
set of inputs that defines a strongly non-decomposable functionality. Since we 
consider the semi-honest model and so parties use only their prescribed inputs, 
it follows that the existence of a secure protocol for a non-decomposable func- 
tion implies the existence of a secure protocol for the strongly non-decomposable 
function defined over the appropriate subset. It thus suffices to prove the theorem 
for a strongly non-decomposable function. 

As we have described above, there are three steps in the proof of this theorem. 
In Section 13.31 we prove the first step. Specifically, in Lemma 13.11 we prove that 
there exists an “exclusive revelation round” which is a round in which one party 
has learned while the other has not, and then in Lemma 130 we prove that such 
a round must exist for inputs that form an insecure minor (defined below). We 
call this an “exclusive revelation minor” . Next in Section 13.41 we prove that the 
existence of an exclusive revelation minor implies the existence of OT with weak 
correctness, and finally in Section 13.51 we explain how to boost the correctness 
and thus obtain full OT (with infinitely-often uniform security). □ 


3.3 Step 1— The Existence of an Exclusive Revelation Minor 

In order to prove our result we exploit the fact that parties obtain information 
about the output of a computation gradually and that one party learns sub- 
stantial information before the other party does. We begin with some notation 
regarding partial protocol executions. For an r-round protocol n and a func- 
tion v : N — > N such that u{n) < r(n) for all n 6 N, we denote by tt v the 
protocol obtained by halting 7 r after round u(n) is completed. Specifically, the 
random variables ViewJ' ( x , y, 1") and View^" ( x , y, 1") describe the views of A 
and B (respectively) in a random execution of 7 r„ on inputs (x, y) with security 
parameter n. 

We next formally define what it means for a party to obtain non-trivial infor- 
mation about the other party’s input. 

Definition 3.4 (distinguishing between inputs). Let n be a c-round pro- 
tocol for computing a functionality f (where c is some function of the security 
parameter n), and fix i € N. For a triple x,y,y' of inputs, we say that A(x) 
distinguishes between y and y' at round i if there exists a polynomial p(-) and a 
(uniform) PPT machine D such that for infinitely many n’s, 

I Pr [D (VW£ (x, y, 1”) , 1”) = 1] - Pr [D (View^ ( x , y' , 1") , 1”) = 1]| > 


124 Y. Lindell, E. Omri, and H. Zarosim 


For a triple x,x',y of inputs we define that B(y) distinguishes between x and x' 
at round i in an analogous way. 

As we will see below, it is crucial that Dbea uniform PPT machine, since the 
parties need to run D in the OT protocol that we construct. For simplicity (and 
since it suffices for our needs), the above definition considers a fixed round i. 
This can be easily generalized to any (polynomial time computable) function 
i : N — > N such that i(n ) < c(n) for every n. 

We now define the notion of an exclusive revelation round, which is just a 
round in which one party can distinguish inputs of the other, while the other 
cannot. Our formulation of this uses Definition Id. 41 

Definition 3.5 (exclusive-revelation round). Let n be a protocol for com- 
puting a symmetric functionality f. Then, n has an exclusive revelation at round 
i if one of the following holds: 

1. There exists a triplet x,y,y' such that A(x) distinguishes between y and y' 
at round i, and B(y) does not distinguish between x and x' at round i for 
any triplet x,x',y (we say that x,y,y' define the revelation round); or 

2. There exists a triplet x,x',y such that B(y) distinguishes between x and x 1 
at round i and, A(x) does not distinguish between y and y' at round i for 
any triplet x,y,y’ (we say that x,x',y define the revelation round). 

Protocol 7r has an exclusive-revelation round if there exists 0 < i < c, such that n 
has an exclusive revelation at round i. 

We are now ready to prove that any constant-round protocol for computing a 
non-constant function (i.e., a function that has at least two different outputs) 
has an exclusive-revelation round. 

Lemma 3.1. Let f be a symmetric functionality that is not constant (and has 
domain of constant size). Let n be a constant-round protocol for securely com- 
puting f. Then, n has an exclusive-revelation round. 

Proof: For every (round number) i < c, every uniform PPT machine (distin- 
guisher) D , and every triplet x,x’,y (recall that there is a constant number of 
such triplets), we define 

=*&.» = \ Pr l D ( view B (*. V , !”) , 1") = 1] - Pr [D (View^ (x\ y, 1") , 1") = l] | 

and let r xx , be the minimal round number 0 < i < c for which there exists a 
polynomial p(-) such that f° r infinitely many n’s. If no such i ex- 

ists, we let f xx , y = c+1. Note that this means that r xx , y is the first round such 
that the PPT machine D can distinguish the ensembles { View'S ( x , y, l")} neN 
and {ViewS {x’,y, l")} neN . 

We further define r X}X > jV = min o {r xx , y } (this is well defined, as every 
r x,x’,y e [c + 1]). Observe that this means that r XiX > !y is the minimal round 
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for which there exists any uniform PPT machine that can distinguish the two 
ensembles (equivalently, the minimal round for which B(y) distinguishes between 
x and x'). For every triplet x,y,y' , we define r XtVtV ' analogously. 

By the correctness of the protocol, for every triplet x,x',y such that f(x, y) ^ 
f(x',y), the view of both parties after the last round (that is, round c) implies 
the output and hence there exists a uniform PPT machine D and a negligible 
function y(-) such that for all sufficiently large n’s, y (n) > 1 — y(n). This 
in turn implies that for such triplets, there exists a PPT machine D for which 
r x,x',y — c > an d hence r XtX ' tV < c. Similarly, for every triplet x,y,y' such that 
f(x,y) 7 ^ f(x,y'), it holds that r XtVtV ' < c. Since / is not constant, there either 
exists a triplet of the former type or of the latter type. 

Now, define i* A = min x ,y, y ' {fx, y , y '} and i* B = min x , x ', y {r x ,x', y }- Note that 
i* A is the minimal round for which there exists a triplet x,y,y' such that A (a;) 
distinguishes between y and y' , and i* B is the minimal round for which there 
exists a triplet x,x',y such that B(y) distinguishes between x and x' . Since / is 
not constant, it holds that either i A < c or i* B < c (or both). We claim that 7 r 
has exclusive revelation at either round i* A or at round i* B . 

Assume without loss of generality that i* A < i* B \ we show that i* A < i* B . It 
suffices to show that i* A < i B , since by the definition of i* B we know that B(y) 
does not distinguish between x and x' at any round i < i* B and for any triplet 
x,x',y. A crucial observation is that the view of a party does not change in 
the round that it is active, and hence, neither does its distinguishing capability. 
Hence, by the minimality of i A , it must be that B is the one sending a message 
in round i A , since otherwise A would be able to distinguish already in round 
i* A — 1. This means that B' s view does not change in round i* A , and hence, by 
the minimality of i* B it cannot be that i* A = i* B . The case that i* B < i A is dealt 
with analogously. □ 

We complete this step of the proof by showing that when a strongly non- 
decomposable function has a protocol with an exclusive-revelation round, this 
round is defined by inputs that form an insecure minor. An insecure minor is a 
tuple of inputs x,x',y,y' such that f(x,y) = f{x,y') and f{x',y ) ^ f(x',y') 
(A-minor), or f(x,y) = f{x',y) and f(x,y') ^ f(x',y') (T-minor). 

Definition 3.6 (exclusive-revelation minor). Let n be a protocol for com- 
puting a symmetric functionality f. If there exists an X -minor x,x',y,y' with 
respect to f such that x' ,y,y' define an exclusive revelation round for n, then we 
say that n has an exclusive-revelation A-minor; an exclusive-revelation T-minor is 
defined analogously. We say that n has an exclusive-revelation minor if it has an 
exclusive revelation X -minor or an exclusive revelation Y -minor. 

The next lemma states that strongly non-decomposable functions have the prop- 
erty that the existence of an exclusive-revelation round implies the existence of 
an exclusive-revelation minor. 

Lemma 3.2. Let n be a protocol that securely computes a strongly non-decompos- 
able symmetric function f with constant-size domain. If n has an exclusive- 
revelation round then it has an exclusive-revelation minor. 
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Proof: The proof follows by analyzing the general structure of strongly non- 
decomposable functions. Let / be any symmetric strongly non-decomposable 
function with a constant-size domain. Assume that there exist Xj,yk,ye (with 
k < i) that define an exclusive revelation at round *; that is, A(xj) distinguishes 
between y & and ye at round i. We show that this implies that n has an exclusive 
revelation A-minor. Since / is a strongly non-decomposable function, it holds 
that yk = ye- Let y,;, , . . . , yi t be such that yk ~ y%^ ~ ~ yi t ~ ye and let 

Via = Vk and Vi t+1 = ye- A(xj) distinguishes between yi 0 and yi t+1 at round i, and 
since t is a constant (recall that / has a constant-size domain), there exists some 
h £ [t+1] such that A(xj) distinguishes between yt k _ x and yi h at round i. Now, by 
definition, since yi h _ 1 ~ yt h , there exists some x such that f(x, yi h ^ t ) = f(x, yi h ). 
Hence, x,Xj,yi h _ x ,yi h forms an exclusive-revelation A -minor. The proof for the 
case that B distinguishes is analogous. □ 

Infinitely-Often. Observe that the existence of an exclusive revelation minor 
means that there exists an insecure minor and a round of the protocol such 
that one party can distinguish the other party’s inputs at this round while the 
other cannot. We stress that a party distinguishes inputs if it has polynomial 
advantage in guessing the input for infinitely many n ’s. It would be preferable 
to prove this for all sufficiently large n’s, since this would enable us to later 
construct a fully secure oblivious transfer protocol, and not just an infinitely- 
often secure oblivious transfer protocol. However, we are unable to do this since 
we need to utilize the existence of a round where one party has learned something 
and the other has not learned anything. We prove this by taking the first such 
round, and this guarantees that in any previous round the other party has not 
learned anything, except possibly for a finite number of n’s. This means that it 
did not learn for infinitely many of the n’s in which the other party did learn, 
as required. In contrast, if we were to take the first round in which one party 
learns for all sufficiently large n’s, then it is possible that the other party has 
learned for infinitely many of these n’s in a previous round, and so security will 
not be guaranteed. 

Constant-Round. We use the assumption that 7T is constant-round in the proof 
that 7r has an exclusive-revelation round (Lemma fa. ID . Recall that an exclusive- 
revelation round is the first round that a party can distinguish between the 
inputs of the other party. If the number of rounds in ir is non-constant, then 
for every n the concrete number of rounds in the protocol is different and hence 
we would have to define an “exclusive-revelation function” ; that is, a function 
v : N — > round number, that defines the first round (as a function of n) that a 
party can distinguish between the inputs of the other party. It is not clear how 
to define such a function, and moreover, how to prove the existence of it. 

Constant-Size Domain. We restrict ourselves to functions with constant-size 
domains (i.e., not dependent on the security parameter) in order to be consistent 
with previous works studying completeness and triviality of symmetric functions 
( jlQIHIj l. Extending the study of completeness to functions with non-constant- 
size domains is beyond the scope of this paper. 
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3.4 Step 2 From an Exclusive-Revelation Minor to io-Weak-OT 

We now show that if a function has a protocol with an exclusive-revelation 
minor, then it can be used to obtain a weak version of oblivious transfer. The 
“weakness” in the OT is with respect to correctness, and not privacy. Formally: 

Definition 3.7. A protocol n is a infinitely-often uniform weak oblivious trans- 
fer protocol (io-weak-OT) if there exists an infinite set M C N such that Equa- 
tions m and 0) hold for every n &N and with respect to uniform distinguishers, 
and there exists a polynomial p(-) such that Equation (0 holds with probability 
5 + f° r ever y 

We stress that the privacy requirement of the oblivious transfer (Equations Q 
and Q) is identical to uniform infinitely-often security in Definition 12.21 How- 
ever, the correctness requirement is weaker since it is only required that correct- 
ness holds with probability noticeably greater than 1/2, and not close to 1. 

Lemma 3.3. Let n = { A,B ) be a protocol for securely computing a functionality 
f . If-K has an exclusive-revelation minor, then there exists a PPT protocol n that 
is an infinitely-often uniform weak oblivious transfer. 

Proof: Intuitively, the existence of an exclusive-revelation round in the protocol 
allows us (in some weak sense) to move to the realm of asymmetric functionalities 
where one party learns the output, while the other party learns nothing. It is 
known that an asymmetric functionality containing an insecure minor implies 
OT. We therefore use the insecure minor guaranteed by the hypothesis of the 
lemma to construct (a weak form of) OT in a way similar to that used in the 
world of asymmetric computation. The formal arguments follow. 

Let 7r be a protocol computing a symmetric functionality /. Assume without 
loss of generality that there exists an X-minor x,x',y,y' with respect to /, such 
that x',y,y’ define an exclusive revelation at round i for n (the case of an exclu- 
sive revelation Y -minor is analogous). That is, we have that A(x') distinguishes 
between y and y' at round i and for every triplet x,x',y, we have that B(y) 
does not distinguish between x and x' at round i. Let D be the corresponding 
distinguisher, and assume without loss of generality that it always outputs either 
0 or 1. Furthermore, since f(x,y) = f(x,y') (by definition of a minor), by the 
security of n we also have that A(x) does not distinguish between y and y' at 
round i (or any round, for that matter). It is without loss of generality (e.g., by 
interchanging y and y') to assume that for infinitely many n’s that 

Pr [D (View)!/ (x', y, 1") , l n ) = 1] - Pr [D (View)!/ {x',y', 1”) , 1") = 1] > 

(4) 

We now show how to construct an io-weak-OT protocol -Jr. Before giving the 
formal description of the protocol, let us give some intuition. The idea is to run 
the protocol on the inputs of the above minor until round i, and then to halt 
the execution. By the exclusiveness of the revelation, we are guaranteed that B 
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learns nothing from the computation, hence the sender S will play the role of 
B. If the receiver R has input 0, then it will use x as its input and play the role 
of A, and hence will not learn anything (recall that f(x,y) = f{x,y') and so 
the output reveals nothing about £?’ s input, meaning that R learns nothing). In 
case i?’s input is 1 it will use x’ as its input for the computation, and will learn 
the output by distinguishing as in Equation 0) . 

Regarding the sender’s input, one possibility is to have the sender to use y' 
as its input for the computation in case b = 0 and y in case b = 1. The receiver 
will then output 0 or 1, depending on what the distinguisher outputs. However, 
it is possible that the distinguisher outputs 0 with probability 3/4 on input 
(x',y), and with probability 3/4 + l/p(n) on input ( x',y In such a case, the 
receiver will output 0 with probability 3/4 even when the output is supposed to 
be 1, and so weak correctness will not hold (recall that we need correctness with 
probability greater than 1/2). In order to overcome this, we have the sender use 
a random input in {y, y'\ and therefore transfer a random bit r to the receiver 
(which in turn will try to learn r only if its input is c = 1) . The sender then sends 
the receiver the bit z = r ® c, and the receiver outputs £ if the distinguisher 
output 0 and z® 1 otherwise. This has the effect of moving the error to be around 
1/2, and so we obtain correctness 1/2 + l/p(n). 

Protocol 1 (An io-weak-OT tt = ( S,R )) 

Inputs: The private input of the sender S is a bit b G {0, 1} and the private 
input of the receiver R is a bit cG {0, 1}. The common input is 1", where n 
is the security parameter. 

The protocol: 

1. The sender chooses a random bit r G {0, 1}. 

2. The parties start an execution of ir, where the sender S plays the role of 
B and the receiver R plays the role of A. The inputs of the parties are 
set as follows: 

— The input of B (played by S) is y' ifr = 0 and y if r = 1. 

— The input of A (played by R) is x if c = 0 and x' if c= 1. 

The parties halt after the i-th round of 7r. Let v\ be the partial view of 
A in this partial execution ofn. 

3. The sender S sends z = r ® 6 to the receiver R. 

4 ■ Ifc = 0, the receiver outputs A. Otherwise (ifc =1/, the receiver executes 
D on v l A , sets r' to be the output of D, and outputs z®r'. The sender 
always outputs A. 

Note that the receiver is allowed to use the distinguisher D since D is a uniform 
Turing machine. 

Proving the Weak- Correctness of the Protocol. Proving the correctness 
when c = 0 is trivial since both parties will always output A as required. We 
consider the case that c = 1. We need to show that there exists a polynomial q (•) 
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such that for infinitely many n’s, it holds that Pr [Output^ ( b,c= 1, 1”) = b] > 
\ + ^y. We will show that this holds for the polynomial q(i) = 2p(i) and for 
all n’s for which Equation 0) is satisfied. We fix such an n. 

Recall that R outputs z(Br', where z =b®r and hence the output of Ft, equals 
b if and only if r' = r, where r' denotes the output of D on the partial view 
v l A . Thus, it suffices to give a lower bound on the following term (recall that we 
consider the case that R uses x' since c = 1): 

Pr[r' = r] (5) 

= Pr [r = 0] ■ Pr [r' = 0 | r = 0] + Pr [r = 1] ■ Pr [r‘ m 1 | r =* j| 

= J ■ Pr [D (View^ (*', y 1 , 1") , 1») ± ■ Pr [D (View? (x',y, 1") , 1”) 

= \ ■ (! - Pr [P (View? (x',/, 1") , 1") *|]|) + \ ■ Pr [D (View** (*',„, 1") , 1”) =4] 

= f + | • ( p r & (ViowJ (x', y, 1") , * f - Pr [D (View^ (x', y> , 1") , 1*) *$) . 

Since Equation 0) is satisfied for n, we have that 

Pr [D (View A (x\ y, 1") , 1") = 1] — Pr [D (View^* {x',y', 1") , l n ) = 1] > -y—r- 

p(n) 

Hence, we conclude that Pr [r' = r\ > \ + 2 p( n ) ’ an< l so correctness holds. 

Proving the Privacy of the Protocol. We now proceed to prove that Equa- 
tions 0 and 0 in Definition 12.11 hold for all sufficiently large n’s (and thus, in 
particular, for infinitely many n’s for which weak correctness holds, as required 
in Definition 12.21) . Due to the lack of space in this extended abstract, we sketch 
this portion of the proof. 

Simulating the View of the Sender. We construct a PPT machine S§ that 
simulates the sender’s view. S§ receives as input the sender’s input b and the 
security parameter 1", and works as follows: 

1. <Sg chooses a random bit £ {0, 1}. 

2. Sg then starts an execution of n on the following inputs until the i-th round: 

— If r§ = 0, the input of B is y' and if r§ = 1, the input of B is y. 

— The input of A is x. 

3. Sg outputs Tg and the partial view v'' n of B. 

The difference between the view of the sender in a real execution and in a sim- 
ulation by Sg is due to the fact that Sg always runs A with x whereas in a real 
execution A runs with x or x' depending on the receiver’s input. Nevertheless, 
these distributions are computationally indistinguishable since i is an exclusive 
revelation round for A. This means that B learns nothing about A’s input up 
to and including round i, and in particular the view of B when A uses x is 
computationally indistinguishable from its view when A uses x' . We stress that 
the fact that i is an exclusive revelation round means that no uniform distin- 
guisher given B’s view can distinguish (by the notion of distinguishing between 
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inputs; Definition 13.411 . This does not necessarily mean that no non-uniform dis- 
tinguisher can distinguish; thus we only achieve privacy with respect to uniform, 
distinguishers. 

Simulating the View of the Receiver. In the case that c = 1 the simulator 
receives both the sender’s and receiver’s inputs c and b and so can perfectly 
simulate the view of the receiver by just running the protocol on these inputs. 
We therefore describe the simulator only for the case that c = 0. The simulator 
S ft receives as input the bit c = 0, the output OT# = A of the functionality OT 
to the receiver, and the security parameter 1", and works as follows: 

1. Sf executes n for i rounds, running A with input x and B with input y. 

2. Sf chooses a random bit zs G {0, 1}. 

3. Sf outputs zs appended to the partial view v\ of A. 

The difference between the simulated view and a real view is that in a real 
execution, the sender playing B sometimes uses y and sometimes uses y' , whereas 
in the simulated execution it always uses y. In addition, the simulator sends a 
random zs that is not correlated to the value r implied by the input used by B in 
the computation of 7 r. In order to see that this makes no difference, first observe 
that since x,x',y,y' form an insecure minor, it holds that f(x,y ) = f(x,y'). 
Thus, when A has input x in an execution of 7r, it cannot distinguish the case 
that B used input y or y’ ; otherwise, A could learn something that is not revealed 
by the functionality output. Thus, the view of the receiver (who runs A) in the 
protocol execution is indistinguishable from its view in the simulation. Given 
the above, it follows that the distribution of a random bit zs is indistinguishable 
from the distribution of z = r © b by the randomness of r. This completes the 
proof. 

□ 

Uniform Security. As explained above, the privacy of the receiver is preserved 
by the exclusiveness of the revelation minor (in round i). That is, since the sender 
in the OT protocol takes the role of the party that cannot disti nguis h the inputs 
of the other party (the one active in round i). By Definition 13.4 no uniform 
distinguisher D succeeds with non-negligible probability in distinguishing the 
two possible inputs of the receiver. It does not, however, rule out the possibility 
that a non-uniform distinguisher has noticeable success probability, yielding the 
privacy of the receiver vulnerable with respect to non-uniform adversaries. 


3.5 From Weak Uniform io-OT to Uniform io-OT 

We conclude the proof by arguing that the existence of a uniform infinitely-often 
weak-OT implies the existence of a uniform infinitely-often OT protocol. Let n be 
a uniform infinitely-often weak-OT protocol. We construct a uniform infinitely- 
often OT protocol 7 r by having the parties run polynomially many executions of 
7r on their inputs. If c = 1, the receiver outputs the majority of the outputs of 
the receiver in 7r, and otherwise it outputs A. It follows from the Chernoff bound 
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that for the infinitely-many n’s for which it has weak-correctness, n is correct 
with probability 1 — p(n), for some negligible function p (•). To prove the privacy 
of if, we use multiple executions of the simulators of the io-weak-OT. A standard 
hybrid argument shows that this yields a satisfactory simulation for the io-OT 
protocol. We stress that a simple hybrid argument works because the parties are 
semi-honest and hence follow the prescribed protocol (specifically, they select 
fresh random coins for each execution). 

This completes the proof of Theorem 13.11 

4 Ideal-Box and Existential Completeness 

Loosely speaking, a functionality is called complete if it can be used to securely 
compute any functionality. In the standard definitions of completeness used in 
previous works (cf. II ( II 311 \ ) , this is defined via the notion of “reduction” . Specif- 
ically g reduces to f if it is possible to securely compute g given access to /, and 
a functionality is complete if all functionalities reduce to it. In this section we 
explore in greater depth how this notion of reduction is defined and what the 
ramifications of this definition are. 

The definition of reduction in all previous works uses the notion of an ideal 
black-box for computing a functionality / = (/a, fn)- The parties A and B run a 
protocol for computing g while given access to an incorruptible trusted party who 
computes / for them throughout the execution (the parties send inputs x and y to 
the trusted party, who computes f(x,y) = (Ja{x, 2/),/b(£, y)), and sends them 
back their respective outputs). A functionality g reduces to a functionality /, if 
g is securely computable given such a trusted party for computing /. This notion 
is equivalent to the notion of oracle-aided protocols, defined in 0 Section 7.3.1]. 
Formally, using the terminology of [5J, all previous definitions say that g reduces 
to / if there exists an oracle-aided protocol n that information-theoretically 
securely computes g when using the oracle functionality / (the only exception 
is jZj that considers computational security rather than information-theoretic). 
A functionality / is called complete if all g reduce to it, and it is called trivial if it 
can be information-theoretically securely computed with no oracle. We call this 
notion ideal-box completeness since the reduction is black-box in the functionality. 

The picture of completeness and triviality for the above definition is well 
known. Specifically, for the case of asymmetric functionalities where only one of 
the parties receives output, a functionality is complete if it contains an insecure 
minor, and trivial if not. Furthermore, for the case of symmetric functionalities 
where the parties receive the same output (i.e., /a = fn ) , a functionality is 
complete if and only if it contains an embedded OR, and is trivial if and only if 
it is decomposable (see Definition 13.31) . 

Combining the above with Theorem 13.11 we have the following corollary: 

Corollary 4.1. There exist symmetric deterministic functionalities over a do- 
main of constant-size that are not neither trivial nor ideal-box- complete, such 
that if there exists a constant round protocol n that securely computes such a 
function, then there exists a uniform infinitely- often OT protocol. 
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We remark that using the results of Kilian jHj , one can show that any function- 
ality can be securely computed with uniform infinitely-often security (Defini- 
tion 12. 2|) given a uniform infinitely-often OT protocol. It therefore seems unlikely 
that such an OT protocol can be constructed under weaker assumption than fully 
secure OT (at least, infinitely-often secure protocols are not known to be con- 
structible under weaker assumptions, and the known black-box separations for 
OT j!SI4j hold also for infinitely-often OT). 

Existential Completeness — An Alternative Formulation. Corollary 14. II 
suggests that there may exist functionalities that are neither trivial nor com- 
plete, and yet are in some sense complete (albeit, under the caveat of uniform 
infinitely-often security). This is due to the fact that the definition of ideal-box- 
completeness relates to the computation of / as atomic, whereas in real life, 
computation is carried out step-by-step, and in particular is not black-box in 
the functionality. We therefore present an alternative notion of completeness 
which is purely existential. Informally, our definition is based on saying that / 
“implies” g in some sense if the feasibility of securely computing g is implied by 
the feasibility of securely computing /. Formally: 

Definition 4.1. Let U denote the set of all polynomial-time computable func- 
tionalities. The achievable class of / £ U, denoted as C(f), is the set of all g£U 
such that if there exists a computationally secure protocol i r/ for computing f, 
then there exists a computationally secure protocol n g for computing g. 

Let f,g eW. We say that g existentially reduces to f if g £ C(f). Functionality 
f is existentially trivial if f £ C(/a) (where f\(-,-) = (X,X)), and is existentially 
complete ifC(f) = U. 

The above definition follows the intuition that a functionality is trivial if it can 
be securely computed “with no help” , and complete if all functionalities can be 
securely computed if it can be securely computed. We stress that if (enhanced) 
trapdoor functions exist, then all functionalities are trivial and complete by this 
definition. Nevertheless, our definition is helpful since a proof that a functional- 
ity f is complete (without proving the existence of enhanced trapdoor permu- 
tations) is essentially a proof that / requires an assumption that implies OT. 
We remark that this is the same as in the definition of (ideal-box) computa- 
tional completeness that appears in 0 . We also note that any functionality that 
is ideal-box-complete, or complete by the computation definition in 0 , is also 
existentially complete. 

We conclude by remarking that the definition of existential completeness has 
the advantage that it can more accurately map the assumptions required for se- 
curely computing a functionality. In particular, a function that is not complete 
cannot imply OT, something which can happen under the ideal-box definition 
(as hinted to by Corollary 14.11) . However, it is also true that the definition of 
existential completeness is less helpful due to its non-constructive nature. Specif- 
ically, it does not enable us to prove or consider a hierarchy of functionalities, 
and a proof that g £ C(f) does not necessarily tell us how to securely compute 
g, even given a protocol for securely computing /. 
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Abstract. Standard constructions of garbled circuits provide only static 
security, meaning the input x is not allowed to depend on the garbled cir- 
cuit F. But some applications — notably one-time programs (Goldwasser, 

Kalai, and Rothblum 2008) and secure outsourcing (Gennaro, Gentry, 

Parno 2010) — need adaptive security, where x may depend on F. We 
identify gaps in proofs from these papers with regard to adaptive secu- 
rity and suggest the need of a better abstraction boundary. To this end 
we investigate the adaptive security of garbling schemes , an abstraction 
of Yao’s garbled-circuit technique that we recently introduced (Bellare, 

Hoang, Rogaway 2012). Building on that framework, we give definitions 
encompassing privacy, authenticity, and obliviousness, with either coarse- 
grained or fine-grained adaptivity. We show how adaptively secure gar- 
bling schemes support simple solutions for one-time programs and secure 
outsourcing, with privacy being the goal in the first case and oblivious- 
ness and authenticity the goal in the second.We give transforms that pro- 
mote static-secure garbling schemes to adaptive-secure ones. Our work 
advances the thesis that conceptualizing garbling schemes as a first-class 
cryptographic primitive can simplify, unify, or improve treatments for 
higher-level protocols. 

1 Introduction 

Overview. Yao’s garbled-circuit technique 0, EH 0. 0. EH has been ex- 
tremely influential, engendering an enormous number of applications. Yet, at 
least in its conventional form, the technique provides only static security. Some 
applications, notably one-time programs [13] and secure outsourcing j§], require 
adaptive security^ In such cases Yao’s technique can be enhanced in ad hoc 
ways, and the enhanced protocol incorporated into the higher-level application. 

This paper provides a different approach. We create an abstraction for the 
goal of adaptively secure garbling. Via a single abstraction, we support a variety 
of applications in a simple and modular way. Let’s look at two of the applications 
that motivate our work. 

Two applications. One-time programs are due to Goldwasser, Kalai, and 
Rothblum (GKR) 0 - The authors aim to compile a program into one that 

1 In speaking of adversaries or security, non-adaptive and dynamic are common syn- 
onyms for what we are here calling static and adaptive. 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 134- |T551 2012. 
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can be executed just once, on an input of the user’s choice. Unachievable in 
any “standard” model of computation, GKR assume what they call one-time 
memory. Their solution makes crucial use of Yao’s garbled-circuit technique. 
Recognizing that this does not support adaptive queries, GKR embellish the 
method by a technique involving output-masking and n-out-of-n secret sharing. 

In a different direction, secure outsourcing was formalized and investigated by 
Gennaro, Gentry, and Parno (GGP) (§]. Here a client transforms a function / 
into a function F that is handed to a worker. When, later, the client would like 
to evaluate / at x, he should be able to quickly map a; to a garbled input X 
and give this to the worker, who will compute and return Y = F(X). The client 
must be able to quickly reconstruct from this y = f(x). He should be sure that 
the correct value was computed — the computation is verifiable — while the server 
shouldn’t learn anything significant about x, including f(x). GGP again make 
use of circuit garbling, and they again realize that they need something from 
it — its authenticity — that is a novum for this domain. 

Issues. Assuming the existence of a one-way function, GKR 0 claim that their 
construction turns a (statically-secure) garbled circuit into a secure one-time 
program. We point to a gap in their proof, namely, the absence of a reduction 
showing that their simulator works based on the one-way function assumption. 
By presenting an example of a statically-secure garbled circuit that, under their 
transform, yields a program that is not one-time, we also show that the gap 
cannot be filled without changing either the construction or the assumption. 
The problem is that the GKR transform fails to ensure adaptive security of 
garbled circuits under the stated assumption. 

Lindell and Pinkas (LP) 0] prove static security of a version of Yao’s pro- 
tocol assuming a semantically secure encryption scheme satisfying some extra 
properties (an elusive and efficiently verifiable range). GGP j§] build a one-time 
outsourcing scheme from the LP protocol, claiming to prove its security based 
on the same assumption as used in LP. We point to a gap in this proof arising 
from an implicit assumption of adaptive security of the LP construction. 

We do not believe these are major problems for either work. In both cases, al- 
ternative ways to establish the the authors’ main results already existed. Goyal, 
Ishai, Sahai, Venkatesan and Wadia 0 present an unconditional one-time com- 
piler (no complexity- theoretic assumption is used at all), while Chung, Kalai 
and Vadhan pj present secure outsourcing schemes based solely on FHE (gar- 
bled circuits are not employed) . Our interpretation of the stated gaps is that they 
are symptoms of something else — a missing abstraction boundary. As recently 
argued by Bellare, Hoang and Rogaway (BHR) 0, it is useful and simplifying 
to see garbling not just as a technique, but as a first-class primitive. To do so, 
our earlier work defines syntax and security notions for garbling schemes, pro- 
vides proven-correct solutions, then solves some example higher-level problems 
by employing a garbling scheme that satisfies the appropriate definition. But 
the security notions of BHR do not go far enough to handle what GKR or GGP 
need, since BHR deal only with static notions of security. The applications we 
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point to motivate the study of adaptive security for garbling schemes, while the 
gaps indicate that the issues may be more subtle than recognized. 

Of course we communicated our findings to the GKR and GGP authors. GKR 
responded after a few weeks with an updated manuscript G3- It modifies the 
claim from their original paper 0 to now claim that their transform works 
under the stronger assumption of a sub-exponentially hard one-way function. 
(This allows “complexity-leveraging,” where a static adversary can guess the in- 
put that will be used by an adaptive adversary with a probability that, although 
exponentially-small, is enough under the stronger assumption.) GGP responded 
to acknowledge the gap and suggest that they would address it by assuming 
the LP construction, or some related realization of Yao’s idea, already provides 
adaptive security. 

Definitions. We now discuss our contributions in more depth. We start from 
the abstraction of a garbling scheme — the raw syntax — introduced by BHR 0 . 
That work gave multiple definitions sitting on top of this syntax, but all were for 
static adversaries, in the sense that the function / to garble and its input x are 
selected at the same time. We extend the definitions to adaptive ones, consider- 
ing two flavors of adaptive security. With coarse-grained, adaptive security the 
input x can depend on the garbled function F but x itself is atomic, provided 
all at once. With fine-grained adaptive security not only may x depend on the 
garbled function F, but individual bits of x can depend on the “tokens” the 
adversary has so-far learned 0 We will see that coarse-grained adaptive security 
is what’s needed for GGP’s approach to secure outsourcing, while fine-grained 
adaptive security is what’s needed for GKR’s approach to one-time programs. 

Orthogonal to adaptive security’s granularity are the security aims them- 
selves. Following BHR, we consider three different notions: privacy, oblivious- 
ness, and authenticity. This gives rise to nine different security notions: {prv, 
obv, aut} x (static, coarse, fine}. We compactly denote these prv, prvl, prv2, 
obv, obvl, obv2, aut, autl, aut2. Informally, when a function / gets transformed 
into a garbled function F, an encoding function e, and a decoding function d, 
privacy ensures that F, d. and X = e(x) don’t reveal anything beyond y = f(x) 
that shouldn’t be revealed; obliviousness ensures that F and X don’t reveal 
even y; and authenticity ensures that F and X don’t enable the computation of 
a valid Y ^ F(X). Privacy is the classical requirement, while obliviousness and 
authenticity are motivated by the application to secure outsourcing. 

Our primary definitions for adaptive secrecy (prvl, prv2, obvl, obv2) are 
simulation-based. In the full version of this paper jj| we give indistinguishability- 
based counterparts as well. For static security this was already done by BHR, 
but it was not clear how to lift those definitions to the adaptive setting. 
Relations. We explore the provable-security relationships among our definitions. 
As expected, the simulation-based definitions imply indistinguishability-based 


2 Fine-grained adaptive security requires the garbling scheme be projective: the garbled 
version of each x = xi ■ ■ ■ x„ € (0, 1}" must be (X* 1 , . . . , X* n ) for some vector of 
2 n strings (A}, X} , . . . , X°, X„). Typical garbling schemes have this structure. 


Adaptively Secure Garbling 137 


ones (namely, prvl => prvl.ind, prv2 =>■ prv2.ind, obvl => obvl.ind, and obv2 => 
obv2.ind). But none of the converse statements hold. BHR had earlier shown 
that, for the static setting, the converse statements do hold as long as the as- 
sociated side-information functior0 is efficiently invertible. In contrast, we show 
that, for adaptive privacy, this condition still won’t guarantee equivalence of 
simulation-based and indistinguishability-based notions. (For obliviousness, it is 
true that obvl.ind => obvl and obv2.ind => obv2 if $ is efficiently invertible.) The 
results are our main reason to focus on simulation-based definitions for adaptive 
privacy. The full version 0 paints a complete picture of the relations among 
our basic definitions. Apart from the trivial relations (prv2 => prvl => prv, 
obv2 => obvl => obv, and aut2 => autl => aut) nothing implies anything else. 
Achieving adaptive security. Basic garbling-scheme constructions |, ED, 
ED, El either do not achieve adaptive security or present difficulties in proving 
adaptive security that we do not know how to overcome. One could give new con- 
structions and directly prove them xxxl or xxx2 secure, for xxx £ {prv, obv, aut}. 
An alternative is to provide generic ways to transform statically secure garbling 
schemes to adaptively secure ones. Combined with results in BHR 0, this would 
yield adaptively-secure garbling schemes. 

The aim of the GKR construction was exactly to add adaptive security to 
statically-secure garbled circuit constructions. We reformulate it as a transform, 
OMSS (Output Masking and Secret Sharing), aiming to turn a prv-secure gar- 
bling scheme to a prv2-secure one. We show, by counterexample, that OMSS 
does not achieve this goal. 

To give transforms that work we make two steps, first passing from static 
security to coarse-grained adaptive security, and thence to fine-grained adaptive 
security. We design these transformations first for privacy (prv-to-prvl, prvl-to- 
prv2) and then for simultaneously achieving all three goals (all-to-alll and alll- 
to-all2). Our prv-to-prvl transform uses a one-time-padding technique from 0 , 
while our prvl-to-prv2 transform uses the secret-sharing component of OMSS. 
Applications. We treat the two applications that motivated this work, one- 
time programs and secure outsourcing. We show that adaptive garbling schemes 
yield these applications easily and directly. Specifically, we show that a prv2 
projective garbling scheme can be turned into a secure one-time program by 
simply putting the garbled inputs into the one-time memory. We also show 
how to easily turn an obvl+autl-secure garbling scheme into a secure one-time 
outsourcing scheme. (GGP jD| show how to lift one-time outsourcing schemes to 
many-time ones using FHE.) The simplicity of these transformations underscores 
our tenet that abstracting garbling schemes and treating adaptive security for 
them enables modular and rigorous applications of the garbled-circuit technique. 
Basing the applications on garbling schemes also allows instantiations to inherit 
efficiency features of future schemes. 


3 The side-information function $ captures that about / one allows to be revealed in 
its garbled counterpart F. 
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Transform 

Model 

Cost 

See 

prv-to-prvl 

standard model 

1*1 + \d\ + \X\ 

Theorem |2| 

prvl-to-prv2 

standard model 

(n+1) \X\ 

Theorem [3 

all-to-alll 

standard model 

\F\ + \d\ + \X\ + k 

Theorem |5| 

alll-to-all2 

standard model 

(n+l)|A| 

Theorem El 

rom-prv-to-prvl 

random-oracle model 

\X\+k 

Full paper 0] 

rom-prvl-to-prv2 

random-oracle model 

\X\ +nk 

Full paper 0] 

rom-all-to-alll 

random-oracle model 

\X\+2k 

Full paper 0] 

rom-alll-to-all2 

random-oracle model 

\X\+nk 

Full paper 0 


Fig. 1. Achieving adaptive security. The name of each transform specifies its rel- 
evant property. The word all means that prv, obv, and aut are all upgraded. Column 
“Cost” specifies the length of the garbled input in the constructed scheme in terms 
of the lengths of the input scheme’s garbled function F, decoding function d, garbled 
input X, number input bits n, and security parameter k. 


Applying our prv-to-prvl and then prvl-to-prv2 transforms to the prv-secure 
garbling scheme of BHR 0 yields a prv2-secure scheme based on any one-way 
function. Combining this with the above yields one-time programs based on one- 
way functions, recovering the claim of GKR 0 . Similarly, applying our all-to- 
alll transform to the obv+aut-secure scheme of BHR yields an obvl+autl-secure 
garbling scheme based on a one-way function, and combining this with the above 
yields a secure one-time outsourcing scheme based on one-way functions. 
Efficiency. Let us say a garbling scheme has short garbled inputs if their 
length depends only on the security parameter k , the length n of /’s input, and 
the length m of f’s output. It does not depend on the length of /. The statically- 
secure schemes of BHR, as with all classical garbled-circuit constructions, have 
short garbled inputs. But our prv-to-prvl and all-to-alll transforms result in 
long garbled inputs. In the ROM (random-oracle model) we are able to provide 
schemes producing short garbled inputs, as illustrated in Fig. CJ Constructing 
an adaptively secure garbling scheme with short garbled inputs under standard 
assumptions remains open. 

Short garbled inputs are particularly important for the application to secure 
outsourcing, for in their absence the outsourcing scheme may fail to be non- 
trivial. (Non-trivial means that the client effort is less than the effort needed 
to directly compute the function j§|.) In particular, the one-time outsourcing 
scheme we noted above, derived by applying all-to-alll to BHR, fails to be non- 
trivial. ROM schemes do not fill the gap because of the use of FHE in upgrading 
one-time schemes to many-time ones y]. Thus, a secure and non-trivial instan- 
tiation of the GGP method is still lacking. (However, as we have noted before, 
non-trivial secure outsourcing may be achieved by entirely different means @|.) 
Further related work. Applebaum, Ishai, and Kushilevitz [lj investigate 
ideas similar to obliviousness and authenticity. Their approach to obtaining these 
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ends from privacy can be lifted and formalized in our settings; one could spec- 
ify transforms prvl-to-alll and prv2-to-all2, effectively handling the constructive 
story “horizontally” instead of “vertically.” The line of work on randomized en- 
codings that the same authors have been at the center of provides an alternative 
to garbling schemes 0 but lacks the granularity to speak of adaptive security. 

Concurrent work by Kamara and Wei (KW) investigates the garbling what 
they call structured circuits 0 and, in the process, give definitions somewhat 
resembling prvl,obvl, and autl, although circuit-based, not function-hiding, 
and not allowing the adversary to specify the initial function. KW likewise draw 
motivation from GKR and GGP, indicating that, in these two setting, the ad- 
versary can choose the inputs to the computation as a function of the garbled 
circuit, motivating adaptive notions of privacy and unforgeability. 


2 Framework 

We now review the syntactic framework of garbling schemes from our earlier 
work 0| . See the full version for 0 basic notation, including conventions for 
randomized algorithms, code-based games, and circuits. 

Garbling schemes. A garbling scheme 0] is a five-tuple of algorithms Q = 
(Gb, En, De, Ev, ev).The first of these is probabilistic; the rest are deterministic. A 
string /, the original function, describes the function ev(/, -) : {0, 1}" — > {0, l} m 
that we want to garble. The values n = f.n and rn = f.m are efficiently com- 
putable from /. On input / and a security parameter k € N, algorithm Gb 
returns a triple of strings ( F,e,d ) G- Gb(l fc ,/). String e describes an encod- 
ing function, En(e, ■), that maps an initial input x G {0, 1}” to a garbled input 
X = En(e,a:). String F describes a garbled function, E v(F, ■), that maps a gar- 
bled input X to a garbled output Y = Ev(T, X). String d describes a decoding 
function, Defy, •), that maps a garbled output Y to a final output y = Defy, Y). 
The correctness requirement is that if / G {0,1}*, k G N, x G (0, lfy' n , and 
( F,e,d ) G [Gb(l fe , /)] , then Defy, Ev(F, En(e, a;))) = ev(/, ir). We also require 
that e and d depend only on k, f.n, f.m, \f\ and the random coins r of Gb. This 
non-degeneracy requirement excludes trivial solutions. 

A common design in existing garbling schemes is for e to encode a list of 
tokens, one pair for each bit in x G {0, 1}". Encoding function En(e, ■) then uses 
the bits of x = x% ■ ■ ■ x n to select from e = (Xf, X \, . . . , X°, X\) the subvector 
X = (X* 1 , . . . , X*”). Formally, we say that garbling scheme Q = (Gb, En, De, 
Ev, ev) is projective if for all f, x, x' G {0,1}^", k G N, and i G [1 ..n], when 
(F, e, d) G [Gb(l fc , /)], X = En(e,a;) and X' = En(e,a;'), then X = (X 1 ,...,X„) 
and X' = (X{, . . . , X^) are n vectors, |Xj| = |X{|, and Xj = X[ if x and x' have 
the same ith bit. Let GS(proj) denote the set of all projective garbling schemes. 

Boolean circuits arise often in this work. We say that Q = (Gb, En, De, Ev, ev) 
is a circuit-garbling scheme if ev is the canonical circuit evaluation function. 
Side-information functions. A garbled circuit might reveal the size of 
the circuit that is being garbled, its topology, the original circuit itself, or 
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something else. The information that we allow to be revealed is captured by 
a side-information function, <$, which deterministically maps / to a string <t> = 
<?(/). We parameterize our advantage notions by ( I>. We require that f.n,f.m 
and | /| be easily determined from (f> = <?(/). Side-information function <P s ; ze 
maps a circuit / = ( n,m,q,A,B,G ) to ( n,m,q ), while <Aopo maps f to f~ = 
Topo (/) = ( n,m,q,A,B ) and d> ciTC is the identity, d> ciTC (f) = f. 

Sizes. We say that garbling scheme Q = (Gb, En, De, Ev,ev) has short garbled 
inputs if there is a polynomial s such that | En(e, ar)| < s(k, f.n, f.m) for all k G N, 
/ G {0, 1}*, ( F , e, d) G [Gb(l fc , /)], and x G {0, 1}^". Let T be a transform that 
maps a garbling scheme Q to a garbling scheme T[C?]. We say that T preserves 
short garbled inputs if T [G] has short garbled inputs when Q does. 

Typical Yao-style constructions, including Garblel and Garble2 0, have short 
garbled inputs. But they are only statically-secure. Keeping garbled inputs short 
seems challenging for adaptive security in the standard model. 


3 Privacy and One-Time Programs 

In this section we define coarse and fine-grained adaptive privacy for garbling 
schemes. We show that some natural approaches to achieve these aims fail. We 
provide alternatives that work. In jjj , we provide more efficient ones in the ROM. 
We apply this to get secure one-time programs. 

Definitions for adaptive privacy. On the top of Fig. H we review the 
defining game for the privacy notion from BHR 0. The adversary is static, in 
the sense it must commit to its initial function / and its input x at the same 
time. Thus the latter is independent of the garbled function F (and the decoding 
function d) derived from /. It is natural to consider stronger privacy notions, 
ones where the adversary obtains F and then selects x. Two formulations for this 
are specified in Fig. |2J We call these adaptive security. The notion in the mid- 
dle panel, denoted by prvl, this paper, is coarse-grained adaptive security. The 
notion in the bottom panel, denoted by prv2, is fine-grained adaptive security. 
This notion is only applicable for projective garbling schemes. 

In detail, let Q = (Gb, En, De, Ev, ev) be a garbling scheme and let $ be a side- 
information function. We define three simulation-based notions of privacy via 
the games Prvg^.s, Prvlg^s, and Prv2 g,<p,s of Fig. |3 Here S, the simulator, 
is an always-terminating algorithm that maintains state across invocations. An 
adversary A interacting with any of these games must make exactly one Garble 
query. For game Prvl it is followed by a single Input query. For game Prv2 
it is followed by multiple Input queries. There, the garbling scheme must be 
projective. The advantage the adversary gets is defined by 

Advg rv ’*’ S (A k) = 2Pr[Pr v£* >s (fc)] - 1 
Advg rvl ^ S (A k) = 2Pr[Prvl £*, s (fc)] - 1 
Advg rv2, s {A, k) = 2Pr[Prv2^ )S (fc)] - 1 . 
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proc Garble (/, x) 

Pr vg,&,s 

&-{0, 1} 

if x 0 {0, l} f n then return Mi 


if 6 = 1 then (F, e, d) <— Gb(l fc , /), X «— En(e, x) 

else y «- ev(/, x), (F, X, d) «- S( l fc , y, *(/)) 
return (F, X, d) 


proc Garble)/) 

proc Input(x) Prvl 

6-{0, 1} 

if x 0 {0, l} f n then return T 

if b = 1 then (F, e, d) «- Gb(l fc , /) 

if b = 1 then X <— En(e, x) 

else (F, d) <— <S(l fc , $(/), 0) 

else y <- ev(/, x), X <- S(y, 1) 

return (F, d) 

return X 

proc Garble)/) 

proc Input)*, c) Prv2 g t 4 >,s 

6 «— (0, 1}; n <— f.n; Q <- 0; 

if * ^ {1, . . . , n} \ Q then return T 

if b = 1 then 

x, <- c; Q<-QU{«} 

(F, (A'?,*, 1 .. ..,X°, Xi), d) <- Gb(l k ,f) 

if \Q\=n then 

else 

x<-xi---x„; y<-ev{f,x); r^y 

(F,d) ^«S(l fc , <£(/), 0) 

if b = 1 then X, «- Xf* 

return (F, d) 

else Xi <r- S(t, i, |Q|) 

return Xi 


Fig. 2. Three kinds of privacy: prv, prvl, prv2. Games to define the static, coarse- 
grained, and fine-grained privacy of Q = (Gb, En, De, Ev, ev). Finalize)!/) returns the 
predicate ( b = b'). Notation s«- S denotes uniform sampling from a finite set. 


For xxx e {prv, prvl, prv2} we say that Q is xxx-secure with respect to (or 
over) ( P if for every PT adversary A there exists a PT simulator S such that 
Adv™’ 5 (A, •) is negligible. We let GSfxxx, <I>) be the set of all garbling 
schemes that are xxx-secure over <P. 

Let us now explain the three games, beginning with static privacy. Here we 
let the adversary select / and x and we do one of two things: garble / to make 
(F, e, d) and encode x to make X, giving the adversary (F, X, d): or, alternatively, 
we ask the simulator produce a “fake” (F, X, d) based only on the security pa- 
rameter k, the partial information <?>(/) about /, and the output y = ev(/, x). 
The adversary will have to guess if the garbling was real or fake. 

For coarse-grained adaptive privacy, we begin by letting the adversary pick /. 
Either we garble it to (F, e, d) <— Gb(l fc , /) and give the adversary (F, d): or else 
we ask the simulator to devise a fake (F, d) based solely on k and (p = <£(/). 
Only after the adversary has received (F, d) do we ask it to provide an input x. 
Corresponding to the two choices we either encode x to X = En(e, x ) or ask the 
simulator to produce a fake X , assisting it only by providing ev)/, x). 

Coarse-grained adaptive privacy is arguably not all that adaptive, as the ad- 
versary specifies its input x all in one shot. This is unavoidable as long as 
the encoding function e operates on x atomically. But if the encoding func- 
tion e is projective, then we can dole out the garbled input component-by- 
component. Only after the adversary specifies all n bits, one by one, is the input 
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fully determined. At that point the simulator is handed y , which might be needed 
for constructing the final token X . 

The OMSS transform. In the process of constructing one-time programs 
from garbled circuits, GKR 0 recognize the need for adaptive privacy of the 
garbled circuits. Their construction incorporates a technique to provide it. This 
technique is easily abstracted to provide, in our terminology, a transform that 
aims to convert a projective, prv garbling scheme into a projective, prv2 garbling 
scheme. Instead of garbling / we pick r «- {0, l} m and garble the circuit g defined 
by g(x) = f(x ) ® r for every x G {0, 1}" where n = f.n and m = f.m. Then 
we secret share r as r = n ® ■ ■ • ® r n and include n in the i-th token, so that 
evaluation reconstructs r and it can be xored back at decoding time to recover 
ev(/, x) as ev(g, x) ® r. Intuitively, this should work because the simulator can 
garble a dummy constant function with random output s and does not have to 
commit to r until it gets the target output value y of / and needs to provide 
the last token, at which point it can pick r = s ® y so that y as desired El- 
Just the same, we show by counterexample that the OMSS does not in work, in 
general, to convert a prv-secure scheme to a prv2-secure one: we present a prv 
secure Q such that OMSS[<5] is not prv2 secure. While this does not show that 
OMSS fails in the context in which GMR use it, our counterexample extends to 
that setting as well; see the full paper 0] . 

Now proceeding formally, we associate to circuit-garbling scheme Q = (Gb, 
En, De, Ev, ev) G GS(proj) the circuit-garbling scheme OMSS[£?] = (Gb 2 , En 2 , De 2 , 
Ev 2 , ev) G GS(proj) defined at the top of Fig. 0 For simplicity we are assuming 
that the decoding rule d in Q is always vacuous, meaning d = e. (We do not need 
non-trivial d to achieve privacy 0, and this lets us stay closer to GKR 0, 
whose garbled circuits have no analogue of our decoding rule.) In the code, 
<?(•) <— /(•) ® r means that we construct from /, r a circuit g such that ev(g, x) = 
ev(/, x) ® r for all x G {0,1}^". (Note we can do this in such a way that 

^topo (<?) = £topo (/).) 

The claim under consideration is that if Q is prv-secure relative to A = ( P topo 
then G 2 is prv2-secure relative to A = <£ topo . To prove this, we would need to 
let A 2 be an arbitrary PT adversary and build a PT simulator <S 2 such that 
Adv^ v2 ’^’ 52 (A 2 , •) is negligible. GKR suggest a plausible strategy for the sim- 
ulator that, in particular, explains the intuition for the transform. We present 
here our understanding of this strategy adapted to our setting. In its first phase 
the simulator <S 2 has input l k ,<p,Q where <t> = #(/), with / being the query 
made by the adversary to Garble. Simulator <S 2 picks s «- (0, 1}" and lets 
f s be the circuit that has output s on all inputs and < ?topo(/s) = 4>- It also 
picks random m-bit strings Si, . . . ,s n and a random input w «- {0, l} n . It lets 
(G, (A^ 1 , X \ , . . . , X°, X*), e ) «- Gb(l fc , f a ) and returns G to the adversary, saving 

a = (s, si , s n ) as state information. In the second phase, when given input 

r, i. j . for j < n — 1, the simulator lets Tj (X ^ Ji , s^) and returns Tj to the 
adversary as the token for bit i of the input. In the case that j = n, the simu- 
lator obtains (from r as per our game) the output y = ev(/, x) of the function 
on input x, the latter defined by the adversary’s queries to Input. It now resets 
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proc Gb 2 (l fc , /) 
n 4— f.n, n,...,r„«-{ 0, l} /m 
r <— n ® • • • ® r„, g{-) <- /(•) ® r 
(G, {X?,Xl,...,X°,Xl),£)^Gb(l k ,g) 
for i 6 {1, . . . , n} do 

T° «- (. Xf,n ), Tl # {Xl,n) 
return (G, (I?, . ,T°,T^),s) 

proc Ev 2 (G, (Ti , Tn)) 

for i 6 (1, ... , n} do (X t , n) e- T< 

Y <— Ev(G, (Ai,...,X„)) 
r <— ri ® • • • ® r„ 
return (Y, r) 

proc En 2 ((T 1 °,T 1 1 ,...,T°,T^),a;) 

Xl---X n <- X 
return (T® 1 , . . . , T^ n ) 

proc De 2 (e, (Y,r)) 
return De(e, V) ® r 

proc Gb(l fc ,g) 

proc Ev(G, (Ai, . . . , X n )) 

(n, m) -f — ( g.n,g.m ) 

for i 6 (1, ... , n} do (Z», V) A* 

(G', (Z?, Zl,..., Z°, Zl), e) Gb'(l fc , 5 ) 

(G',v,V) <-G 

for i € (1, ... ,n} do V°, V> «- (0, l} m 

return Ev'(G', (Zi, . . . , Z n )) 

Vi ■ ■ -v n <— v «- {0, l} n , V «— { 0 , l} m 


if n > k then 


V «- ev( 5 , v) ® Vi 1 ® • • • ® 

proc En((A 1 °,A 1 1 ,...,A°,Ai),a;) 

for i 6 {1, . . . , nj do 

Xl ■ ■■Xn-h- X 

Xf <r- (Zf, If). A', 1 •(- {Z}, Vi ) 

return (A® 1 , . . . , A^” ) 

G <— (G',v,V) 


return (G, (A?, A°, X„), e) 



Fig. 3. OMSS definition (top). Scheme OMSS[C?] = (Gb2, Eri2, De2, Ev2, ev) where 
Q = (Gb, En, De, Ev, ev). OMSS counterexample (bottom). The garbling scheme 
Q = (Gb, En, De, Ev, ev) obtained from Q ' — (Gb', En', De, Ev', ev) is prv secure when 
Q' is, but OMSS[C/] is not prv2 secure. 


Si = y © s ® Si ® si © • • • © s n and returns (A*, si), so that evaluation of the 
garbled function indeed results in output y. 

This simulation strategy is intuitive, but trying to prove it correct runs into 
problems. We have to show that Advg ™ 2 ’ 52 (A2 , •) is negligible. We must 
utilize the assumption of prv security to do this, which means we must perform 
a reduction. The only plausible path towards this is to construct from A2 an 
adversary A against the prv-security of Q and then exploit the existence of a 
simulator S such that Adv|l rv ’ 5 (A, •) is negligible. However, it is not clear 
how to construct A, let alone how its simulator comes into play. (As we will 
see when proving our transforms, the proof template that works is different, not 
trying first to build < 5 > 2 , but instead building A from A2 and then <S 2 from S.) 

The problem turns out to be more than technical, for we will see that the 
transform itself does not work in general. By this we mean that we can exhibit 
a (projective) circuit-garbling scheme Q = (Gb, En, De, Ev, ev) that is prv-secure 
relative to # = $ to po but the transformed scheme Q2 = OMSS[d] is subject to 
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proc Gbi(l fc , /) 

(F, e, d) 4— Gb(l fc , /) 

F'^{0,1} |f| , d'<^{0,l} |d| 

F\ F © F', di <— d ® d! 
ei f— (e, d' , F') 
return (Fi,ei,di) 

proc Evi(Fi,Ai) 

(X,d',F')^X i, F •<— Fi © F' 

Y <- Ev(F, X) 
return ( Y , d') 

proc Em(ei, x) 

(e,d',F')<-ei, A<-En(e,a:) 
return (A, d' , F') 

proc Dei(di,Fi) 

(y,d')<-Ti, d <— di(B d' 

return De(d, Y) 

proc Gb 2 (l fc , /) 

proc Ev 2 (F,A 2 ) 

(F, e, d) •(— Gbi(l fc , /) 

((U 1 ,Si),...,(U n ,S n ))^X 2 

(Aj lXl,...,X°,X')^e 

Z £- Si © • • • © S n 

N<r- |Em(e,0")| 

(Zl,...,Z n ) Z 

for i £ {1, . . . ,n} do 

X <- (Ui © Zi, . . . , U n © Z n ) 

Zi «- {0, l} 1 *” 1 , Si «- (0, l}* 7 

return Evi(F,A) 

Z <- (Ai, . . . , Z n ) 


S n i — Z © Si © • • • © Sn—1 

proc En2(e2,x) 

for i £ {1, . . . , n} do 

(7f , A'i , . . . , 7"£, A^) -f— e 2 

7? «- (A° © Zi,Si), Tl <- (Xj © Zt, Si) 

XI • • • x n «— X 

return (F, . . . , T°, T*), d) 

return (T* : 1 , . . . , T% n ) 


Fig. 4. Transform prv-to-prvl (top): Scheme Qi = (Gbi, Eni, Dei, Evi, ev) £ 
GS(prvl , <F) obtained by applying the prv-to-prvl transform to Q = (Gb, En, De, Ev, 
ev) £ GS(prv, <F). Transform prvl-to-prv2 (bottom): Projective garbling scheme 
Q'i = (Gb2, Eri2, De, Ev2,ev) £ GS(prv2, ( J>) obtained by applying the prvl-to-prv2 trans- 
form to projective garbling scheme Qi = (Gbi, Eni, Dei, Evi,ev) £ GS(prvl , <F) 


an attack showing that it is not prv2 secure. This means, in particular, that the 
above simulation strategy does not in general work. 

To carry this out, we start with an arbitrary projective circuit-garbling scheme 
Q' = (Gb 7 , En 7 , De, Ev 7 ,ev) assumed to be prv-secure relative to <I> = <l>t 0 po ■ We 
then transform it into the projective circuit-garbling scheme Q = (Gb, En, De, Ev, 
ev) shown at the bottom of Fig. 0 (We assume the decoding rule of Q 7 is vacuous, 
a feature inherited by Q. We are letting v denote the bitwise complement of a 
string v.) The following proposition, whose proof is in the full paper 0 ], says 
that Q continues to be prv-secure but an attack shows that OMSS[<5] is not 
prv2-secure. (The proof shows it is in fact not even prvl secure.) 

Proposition 1. Let ev be the canonical circuit-evaluation function. Assume 
Q' = (Gb 7 , En 7 , De, Ev 7 ,ev) £ GS(prv, ^ toP o) H GS(proj) and let Q = (Gb, En, De, 
Ev,ev) £ GS(proj) be the garbling scheme shown at the bottom of Fig. 0 Then 
(1) g £ GS(prv,<£ topo ) n GS(proj), but (2) OMSS[£?] £ GS(prv2,<?> topo ). 

Achieving prvI security. We now describe a transform prv-to-prvl that 
successfully turns a prv secure circuit garbling scheme into a prvl secure one. 
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Combined with established results 0 , this yields prvl-secure schemes based on 
standard assumptions. The idea is to use one-time pads to mask F and d, and 
then append the pads to X. This will ensure that the adversary learns nothing 
about F and d until it fully specifies function / and x. Given a (not necessarily 
projective) garbling scheme Q = (Gb, En, De, Ev, ev), the prv-to-prvl transform 
returns the garbling scheme prv-to-prvl [Q] = (Gbi, Eni, Dei, Evi, ev) at the top 
of Fig. 0 We claim: 

Theorem 2. For any <£, if Q G GS(prv,<£) then prv-to-prvl [Q\ G GS(prvl,<£). 

The intuition behind the prv-to-prvl transform (outlined above) is simple, but 
the proof template is instructive in indicating how to move from the intuition 
to a formal proof. Given any PT adversary A\ against the prvl-security of Q\ 
we build a PT adversary A against the prv-security of Q. Now the assumption 
of prv-security yields a PT simulator S for A such that AdVg rv ’ 3 ’ , ‘ S (.A, •) is 
negligible. Now we build from S a PT simulator <Si such that for all k G N we 
have AdvE™ 1,!p ’ Sl (,Ai, fe) < Advj} rv ’ 5 (A, k). This yields the theorem. In the 
full paper 0 we provide a full proof that shows how to build A and <Si . 
Achieving prv2 security. Next we show how to transform a prvl scheme 
into a prv2 one. Formally, given a projective garbling scheme Q = (Gb, En, De, 
Ev,ev) G GS(prvl, F), the prvl-to-prv2 transform returns the projective garbling 
scheme prvl-to-prv2[t/] = (Gb 2 , En 2 , De, Ev 2 ,ev) shown at the bottom of Fig. 0 
The idea is to mask the garbled input and then use the second part of GKR’s 
idea as represented by OMSS, namely secret-share the mask, putting a piece 
in each token, so that unless one has all tokens, one learns nothing about the 
garbled input. The formal proof of the following is in the full paper 0. 

Theorem 3. For any <P, if Q\ G GS(prvl,<£) fl GS(proj) then prvl-to-prv2[C/i] G 
GS(prv2,<£) n GS(proj). 

One-time compilers. Starting from garbling schemes with prv2 security, 
we give simple designs, and proofs, for one-time programs. We begin with the 
definitions. Following GKR [iJ], the intent is that possession of a one-time pro- 
gram P for a function / should enable one to evaluate / at any single value x; 
but, beyond that, the one-time program should be useless. Unachievable in any 
standard model of computation (where possession of P would enable its repeated 
evaluation at multiple point), GKR suggest achieving one-time programs in a 
model of computation that provides one-time memory — tamper-resistant hard- 
ware whose read-once i-th location returns, on query (i, b) G N X {0,1}, the 
string TP, immediately thereafter expunging T^~ b . A one-time compiler proba- 
bilistically transforms the description of a function / into a one-time program P 
and its associated one-time memory T. 

For a formal treatment, we begin by specifying two stateful oracles; see Fig.0 
The first, OTP/, formalizes the desired behavior of a one-time program for /. 
Here / will now be regarded as a string, not a function, but this string represents 
a circuit computing a function ev(/) : {0,1} -O' — > {0, l}^ m ; we write ev for 
the canonical circuit-evaluation function jjj. The agent calling out to OTP/ 
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proc OTP/ (x) 

proc OTMt(*, 6) 

if x 0 {0, 1}7” then ret 

(T? ,Tl , . . . ,T° ,Tj) 4 — T 

if called then ret _L 

if i £ [1..7] or usedi or b 0 {0, 1} then ret _L 

called «— true 

usedi t— true 

ret ev(/, x) 

ret Ti 


Fig. 5. Oracles model one-time programs and one-time memory. Oracle OTP 
depends on a string / representing a boolean circuit. Oracle OTM depends on a list of 
strings T. 


provides x and, on the first query, it gets ev(/. x). Subsequent queries return 
nothing. On the right-hand side of Fig. 0we similarly define an oracle OTM 7 -, 
this to model possession of a one-time-memory system. Given a list of £ pairs of 
strings (establish some convention so that every string T is regarded as denoting 
a list of £ pairs of strings, for some £ £ N) the oracle returns at most one string 
from each pair, otherwise satisfying each request. 

Elaborating on GKR, we now define a one-time compiler as a pair of prob- 
abilistic algorithms 77 = (Co, Ex) (for compile and execute). Algorithm Co, on 
input l k and a string /, produces a pair ( P,T ) 1 — Co(l fc ,/) where P (the one- 
time program) is a string and T (the one-time-memory) encodes a list of 27 
strings, for some £. Algorithm Ex, on input of strings P and x, and given access 
to an oracle O, returns a string y <— Ex°(P, x). We require the following cor- 
rectness condition of II = (Co, Ex): if (P, T) <- Co(l k ,f) and x £ {0, l}f n then 
Ex o™ T( .’.)(P,a:) =ev(/,x). 

The security of II = (Co, Ex) will be relative to a side-information function 
<1; the value (p = 7>(/) captures the information about / that P is allowed to 
reveal. So fix a one-time compiler II = (Co, Ex), an adversary A, a security 
parameter k, and a string f. ( 1 ) Consider the distribution Real/ 7 , . 4 , /(&) deter- 
mined by the following experiment: first, sample (P,T) <— Co(l k ,f); then, run 
A. 0 ™ T (')(l fe , P) and output whatever A outputs. (2) Alternatively, fix a one- 
time compiler II = (Co, Ex), an information function 7>, a simulator S, a security 
parameter k , and a string /. Consider the distribution Faken,$,s,f(k) determined 
by the following experiment: run 5 0 TPf (')(l fc , 1(f)) and output whatever S out- 
puts. For D an algorithm and II, <P, A, S , and k as above, let 

Adv °n%,A,s,T>(k) = Pr [(/, a) <- P(l fc ); v «- Rea\ n , Aj{k ) : T>(<j, v) =► 1] - 
Pr[(/, a) 4r- V(l k ); v «- Fakers j(k ) : V(a, v) =* 1] 

One-time compiler II is said to be (OTC-) secure with respect to side-information 
function <I> if for any PPT adversary A there is a PPT simulator S such that for 
all PPT distinguishers D, function Adv^^ A S ^(k) is negligible. 
Constructing an OTC from a garbling scheme. A circuit-garbling 
scheme Q = (Gb, En, De, Ev, ev) can be turned into a one-time compiler II = 
(Co, Ex) in a natural way: let OTC[<?] = (Co, Ex) be defined as follows. (1) 
Co(l fc ,/): let ( F,e,d ) <- Gb (/) and return (P, T) where P = (P, d) and T = e. 
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(2) Ex°(P, a;): Let (F, d) <- P , let x\---x n «— x, query oracle O on (1, rci}, 
. . . ,(n, x n ) to obtain Xi, . . . , X n , respectively, and return De(ri, EvfF, X)) with 
X = (Xi , . . . , X n ). The proof of the following is in the full paper [j]. 

Theorem 4. If Q is a prv2-secure garbling scheme over side-information func- 
tion $ then OTC[<5] is OTC-secure with respect to side-information <I>. 

The straightforwardness of the construction and its trivial proof are, we believe, 
points in our favor, evidence of our claim that the garbling scheme abstraction 
and appropriate security notions for it engender applications in direct, simple 
and less error-prone ways. 

Separation. In the full paper jj|, we elaborate on how Proposition |T| gives 
an example of a garbling scheme Q such that OTC[OMSS[<5]] is not otc-secure. 
We explain why this refutes GKR’s claim 0 that their construction provides 
a secure one-time compiler assuming one-way functions. 

4 Obliviousness, Authenticity and Secure Outsourcing 

We define obliviousness and authenticity, both with either the coarse-grained 
or fine-grained adaptivity. We show how to achieve these goals, in combination 
with adaptive privacy, via generic transforms and in the standard model. In the 
full paper 0] we provide more efficient transforms in the ROM. Finally we apply 
this to obtain extremely simple and modular designs, and security proofs, for 
verifiable outsourcing schemes based on the paradigm of GGP Q. 
Obliviousness. Intuitively, a garbling scheme is oblivious if garbled function F 
and garbled input X, these corresponding to / and x, reveal nothing of / or x 
beyond side-information <£(/). In particular, possession F and X will not allow 
the calculation of y = ev(/, x). 

The formal definition for static obliviousness is from BHR 0. See the top 
of Fig. El We add to this two new definitions, to incorporate either coarse- 
grained or fine-grained adaptive security. See the rest of Fig. El Fine-grained 
adaptive security continues to require that Q be projective. The games used 
for defining obliviousness closely mirror their privacy counterparts. The first 
important difference is that the adversary does not get the decoding function d. 
The second important difference is that the simulator must do without y = 
ev(/, x). For a garbling scheme Q, side-information ( I>, simulator S, adversary A, 
and security parameter k G N, we let Advg bv ’ ^ ’ s (A, k) = 2 Pr[ObVg (PS (k)} — 1, 
Ad v° hvl ’*’ s {A,k) = 2Pr[Obvl^ s (fc)] - 1, and finally Adv° bv2 ’^(M, jfe) = 
2 Pr[Obv2^ s (fc)] — 1. Garbling scheme Q is obv-secure with respect to $ if for 
every PPT A there exists a simulator S such that Advg bv ’ *' S (A, k) is negligible. 
We similarly define obvl and obv2 security. For xxx G {obv, obvl, obv2} we let 
GS(xxx, <P) denote the set of all garbling schemes that are xxx-secure over dx 
Fig. El also formalizes the games underlying three definitions of authenticity, 
capturing an adversary’s inability to create from F and X a garbled output 
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proc Garble (/, x) 


Obvg,#,s 

6 <^{0,1} 

if x A {0, l} f n then return _L 

if b = 1 then (F, e, d) «- Gb(l fc , /), X ■£- 1 

else (F, X) «— S(l k , <£(/)) 

return (F, X) 

En(e, x) 


proc Garble(/) 

proc Input(:e) 

Obvlg, 4 >,s 

6- {0,1} 

if b = 1 then (F, e, d) <- Gb(l fc , /) 
else F <— 5(l fc , #(/), 0) 

return F 

if x A {0, 1}^'” then return A 
if b = 1 then X <— En(e,a:) 
else X <- 5(1) 
return X 

proc Garble(/) 

proc Input(*, c) 

Obv2 g,&,s 

6 «— (0, 1}; n i- f.n; Q £- 0; a <- e 

if b = 1 then 

(F, (X?, X } , . . . , X°, X£), d) <- Gb(l fc , /) 
else F-s-5(l fc , <£(/), 0) 

return F 

if i A {1, . . . , n} \ Q then return 1 
Xi c; Q Q U {*} 
if b = 1 then Xi «- Xf* 
else Xi <- S(i, |Q|) 

return Xi 


proc Garble (/,*) 


Aut g 

if x A {0, 1}F” then return 4 

(F, e, d) £- Gb(l fe , /), X «- En(e, x) 
return (F, X) 



proc Garble(/) 

proc Input(x) 

Autl g 

(F, e,d) <— Gb(l fc ,/) 

return F 

if x A {0, 1}^'" then return A 

X ■<— En(e, x) 
return X 

proc Garble(/) 

proc Input(*, c) 

Aut2g 

n <— f-n; Q 4- 0; <r ^ e 

(F, (X?, X} , . . . , X° , X* ) , d) <r- Gb(l fe , /) 

return F 

if i A {1, . . . , n} \ Q then return 1 
Xi <— c; Q<-QU{i}, Xi-f-Xf* 
if |Q| = n then X -f- (Xj , . . . , X n ) 

return Xi 


Fig. 6. Obliviousness (top). Games for defining the obv, obvl, and obv2 security of 
Q = (Gb, En, De, Ev, ev). For each game, Finalize(6') returns ( b = b'). Authenticity 
(bottom). Games for defining the aut, autl, and aut2 security of Q = (Gb, En, De, Ev, 
ev). Procedure Finalize(F) of each game returns (De(d, Y) A A and Y A Ev(F, X)). 


Y A F(X) that will be deemed authentic. The static definition of BHR 0 is 
strengthened either to allow the adversary to specify x subsequent to obtain- 
ing F, or, stronger, the bits of x are provided one-by-one, each corresponding 
token then issued. For the second case, game Aut2, the garbling scheme must 
once again be projective. For a garbling scheme Q, adversary A, and security 
parameter k £ N, we let Advg ut (A, k) = 2Pr[Autg (k)] — 1, Advg utl (A, k) = 
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proc Gbi(l fc ,/) 

proc Em{ei,®) 

(Fi c, d) 4— Gb(l fc , /) 

(e,d',F\ tag) •(— ei 

F'«-{0, 1} |F| , d! «- {0, l} |d| 

Fi<-F©F', K «- {0, l} fc , di <— (d ® d', K) 
tag <— F K{d'), ei <— (e, d', F' , tag) 
return (Fi,ei,di) 

return (En(e, x), d', F', tag) 

proc Evi(Fj,Xi) 

proc Dei(di,Yi) 

(X, d' , F' , tag) g- Xi, FgFi®P 

(Y, d', tag) 4— Yi 

Y <r- Ev(F, X) 

l D,K)<-di , d«--D®d' 

return IY, d',tag) 

if tag ^ Fj <{d!) then return _L 
return De(d, Y) 


Fig. 7. Scheme all-to-alll[C?] = (Gbi, Eni, Dei, Evi,ev) £ GS(prvl,^) fl GS(obvl,^) fl 
GS(autl) obtained from scheme Q = (Gb, En, De, Ev, ev) G GS(prv, $) n GS(obv,$) fl 
GS(aut). The transform uses a PRF F : {0, l} fc x {0, 1}* ->■ {0, l} k . 

2 Pr[Autl^(fe)] — 1, and Advg ut2 (.4., k) = 2 Pr[Aut2g (fc)] — 1. Garbling scheme Q 
is aut-secure with respect to $ if for every PPT A Adv^ ut (A, k) is negligible. 
We similarly define autl and aut2 security. For xxx G {aut, autl, aut2} we let 
GS(xxx) denote the set of all garbling schemes that are xxx-secure. 

Achieving obvI and autI security. It is tempting to think that the 
prv-to-prvl operator in Fig. 0 also promotes xxx- security, with xxx G {obv, aut}, 
to xxxl-security, but it does not. We now show how to change prv-to-prvl to 
an operator all-to-alll that promotes any xxx G (prv, obv, aut} to being xxxl 
secure. See Fig. 0 The proof of the following is in the full paper 0. 

Theorem 5. (1) For any ( I> and any xxx G {prv, obv}, if Q G GS(xxx, <&) then 
all-to-alll[<?] G GS(xxxl,«P) (2) If Q G GS(aut) then all-to-alll[t/] G GS(autl) 
(3) If Q G GS(proj) then all-to-alll[t/] G GS(proj). 

Achieving obv2 and aut2 security. The transform to promote coarse- 
grained to fine-grained security is unchanged. We let alll-to-all2 = prvl-to-prv2 
be the transform at the bottom of Fig. 0 We claim it has additional features 
captured by the following, whose proof is in the full paper 0 . 

Theorem 6. (1) For any $ and any xxx G {prv, obv} if Q\ G GS(xxxl,<£) fl 
GS(proj) then alll-to-all2[t/i] £ GS(xxx2,$) fl GS(proj) (2) If Q\ G GS(autl) fl 
GS(proj) then alll-to-all2[£i] G GS(aut2) fl GS(proj). 

Outsourcing definitions. Towards the application to secure outsourcing, 
we begin with the definitions, following GGP 0. An outsourcing scheme U = 
(Gen, Inp, Out, Comp, ev) is a tuple of PT algorithms that, intuitively, will be run 
partly on a client and partly on a server. Generation algorithm Gen is run by the 
client on input of the unary encoding l k and a string / describing the function 
ev(/, •) : {0, l}^ n — > {0, l}f- m to be evaluated (so that ev, like in a garbling 
scheme, is a deterministic evaluation algorithm) to get back a public key pk that 


150 M. Bellare, V.T. Hoang, and P. Rogaway 


is sent to the server and a secret key sk that is kept by the client. Algorithm Inp 
is run by the client on input pk, sk and x € {0, 1}^'” to return a garbled input 
X that is sent to the server. Associated state information St is preserved by 
the client. Algorithm Comp is run by the server on input pk, X to get a garbled 
output Y that is returned to the client. The latter runs deterministic algorithm 
Out on pk,sk,Y,St to get back y e {0, l}f n U {_L}. Correctness requires that 
for all k G N, all / e {0,1}*, and all x G if {pk,sk) <- Gen(l fc ,/), 

(. X,St ) <- \np{pk, sk,x), Y <— Comp (pk,X), and y 4— Out (pk, sk,Y, St), then 
y = ev(/, x ). Our syntax is the same as that of GGP j§] except for distinguishing 
between functions and their descriptions, as represented the addition of ev to 
the list. 

The games OSVF n and OSPR/ 7 ,< 2 >,s os of Fig. 0 are used to define verifiability 
and privacy of an outsourcing scheme 77 = (Gen, Inp, Out, Comp, ev), where ( P 
is a side-information function and S os is a simulator. In both games, the adver- 
sary is allowed only one GetPK query, and this must be its first oracle query. 
For adversaries A )s and B os , we let Adv'/) vf (A os , k) = Pr[OSVF^ os (A;)] and 
Adv^ pr ’*’ Sos (£ os , k) = 2Pr[OSPR^ 5os (fc)] - 1. We say that 77 is verifiable if 
Adv^ vf (A os , •) is negligible for all PT adversaries A os ■ We say that 77 is pri- 
vate over 7> if for all PT adversaries B os there is a PT simulator <S os such that 
Adv^ pr ’ $, ‘ Sos (A 0 s, •) is negligible. An adversary is said to be one-time if it makes 
only one Input query. We say that 77 is one-time verifiable if Adv^ vf (A os , •) is 
negligible for all PT one-time adversaries A os . We say that 77 is one-time private 
over <P if for all PT one-time adversaries B os there is a PT simulator <S os such 
that Adv^ pr ’ 4> ’ 5os (Ao S , •) is negligible. 

Our verifiability definition coincides with that of GGP j§] but our privacy 
definition is stronger: it requires not just “input privacy” (concealing each in- 
put x) but, also, privacy of the function / (relative to (7). (As in our garbling 
definitions this is subject to <7(/) being revealed). Also, while GGP use an 
indistinguishability-style formalization, we use a simulation-style one, as this 
is stronger for some side-information functions. 

To be “interesting” the work of the client in an outsourcing scheme should 
be less than the work required to compute the function directly, for otherwise 
outsourcing is not buying anything. An outsourcing scheme is said to be non- 
trivial if this condition is met. 

From garbling to outsourcing. GGP show how to use FHE to turn 
any one-time verifiable and private outsourcing scheme into a fully verifiable 
and private one. This allows us to focus on designing the former. We show how 
a garbling scheme that is both autl and obvl secure immediately implies a 
one-time verifiable and private outsourcing scheme. The construction, given in 
Fig. |H1 is very direct, and the proof of the following, given in the full paper 0, 
is trivial, points which reinforce our claim that the garbling scheme abstraction 
and adaptive security may be easily used in applications: 

Theorem 7. If Q e GS(obvl, <P) n GS(autl) then outsourcing scheme n[Q] is 
one-time verifiable and also one-time private over (7. 


Adaptively Secure Garbling 151 


proc GetPK(/) OSVFii 

( pk , sk ) «— Gen(l fc , /), i «— 0 

return pk 

proc Input (a;) 

if x 0 {0, l} f n then return _L 

* t— i + 1, Xi <r— x 
( Xi , Sti) 4— lnp(pfc, sk, x) 
return A; 

proc Finalize (Y, j) 

if j £ {1, . . . , i} then return false 

y <— Out (pk, sk, Y, Stj ) 
return (y {ev(/, Xj ), _L}) 


proc GetPK(/) 0SPRj7 i4 .,s o 

c«- {0, 1} 

if c = 1 then (pk, sk) 4— Gen(l fc , /) 
else (pfc,o-)<-5os(l fc ,^(/)) 

return pk 

proc Input(x) 

if x 0 {0, 1}^'" then return _L 

if c = 1 then (A, St) <— Inp (pk, sk, x) 
else (X,a) ^ S os (a) 

return X 

proc FlNALIZE(c') 
return (c = c') 


Gen(l fc , /) \np(F,(e,d),x) Comp (F,X) Out (F, (e,d),Y, St) 

(F, e, d) <— Gb(l fc , /) X^En(e,x) Y^Ev(F,x) y •(- De(d,Y) 

return (F, (e, d)) return (X, e) return Y return y 


Fig. 8. Games to define the verifiability (OSVF) and privacy (OSPR) of outsourcing 
scheme 77 = (Gen, Inp, Out, Comp, ev). Bottom: constructing the outsourcing scheme 
II[Q] = (Gen, Inp, Out, Comp, ev) from garbling scheme Q = (Gb, En, De, Ev, ev). 


A benefit of our modular approach is that we may use any obvl + autl garbling 
scheme as a starting point while GGP were tied to the scheme of 0 . However, 
the latter scheme is not adaptively secure, which brings us to our next point. 
Discussion. GGP give a proof that their outsourcing scheme is one-time verifi- 
able assuming the encryption scheme underlying t he g arbled- circuit construction 
of 0 meets the condition called Yao-secure in fllf . However, their proof has 
a gap. Quoting 0, p. 12 of Aug 2010 ePrint version]: “For any two values x,x' 
with f(x) = f(x'), the security of Yao’s protocol implies that no efficient player 
P 2 can distinguish if x or x' was used.” This claim is correct if both x and x' 
are chosen independently of the randomness in the garbled circuit. But in their 
setting, the string x is chosen after the adversary sees the garbled circuit, and 
the security proof given by 0 no longer applies. 

One may try to give a new proof that the LP garbling scheme satisfies autl se- 
curity. However, this seems to be difficult. Intuitively, an adaptive attack on the 
garbling scheme allows the adversary to mount a key-revealing selective-opening 
(SOA-K) attack on the underlying encryption scheme. But SOA-K secure en- 
cryption is notoriously hard to achieve |2( • The only known way to achieve it is 
via non- committing encryption 0- 1§] , which is only possible with keys as long 
as the total number of bits of message ever encrypted □ so the outsourcing 
scheme may fail to be non-trivial. 

This brings us to a more full discussion of non-triviality. The obvl + autl 
secure scheme obtained via our all-to-alll transform has long garbled inputs, so 
the one-time verifiable outsourcing scheme yielded by Theorem 0 while secure, 
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is not non-trivial. Our ROM transforms coupled with Theorem Q yield a non- 
trivial one-time outsourcing scheme in the ROM but the FHE-based method of 
GGP of lifting to a many-time scheme fails in the ROM. Finding a obvl + autl 
garbling scheme with short garbled inputs in the standard model under standard 
assumptions is an open problem. We think Theorem 0 is still useful because it 
can be used at any point such a scheme emerges. All this again is an indication 
of the subtleties and hidden challenges underlying adaptive security of garbled 
circuits that seem to have been overlooked in the literature. 
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Abstract. This paper presents the Generalized Randomized Iterate of a 
(regular) one-way function / and show that it can be used to build Universal 
One-Way Hash Function (UOWHF) families with 0(n 2 ) key length. 

We then show that Shoup’s technique for UOWHF domain extension 
can be used to improve the efficiency of the previous construction. We 
present the Reusable Generalized Randomized Iterate which consists of 
k > n + 1 iterations of a regular one-way function composed at each 
iteration with a pairwise independent hash function, where we only use 
log k such hash functions, and we “schedule” them according to the same 
scheduling of Shoup’s domain extension technique. The end result is a 
UOWHF construction from regular one-way functions with an 0(n log n) 
key. These are the first such efficient constructions of UOWHF from 
regular one-way functions of unknown regularity. 

Finally we show that the Shoup’s domain extension technique can also 
be used in lieu of derandomization techniques to improve the efficiency 
of PRGs and of hardness amplification constructions for regular one-way 
functions. 


1 Introduction 

One of the central results in Modern Cryptography is that one-way functions 
imply digital signatures (as defined in jj). This result was first established by 
Naor and Yung in m for one-way permutations via the notion of Universal One- 
Way Hash Functions (UOWHF). Later Rompel in ^3] proved that UOWHFs can 
be built from any one-way function. The notion of UOWHF is interesting on its 
own, apart from its connection to digital signatures. UOWHFs are compressing 
functions (i.e. the output is shorter than the input) which enjoy a target collision 
resistance property: a function family Q is a UOWHF if no efficient adversary A 
succeeds in the following game with non-negligible probability: 

— A chooses a target input z; 

— a randomly chosen function g G Q is selected; 

— A finds a collision for g(z), i.e. an input z' ^ z such that g(z) = g(z'). 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 154-[T7T] 2012. 
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A seemingly weaker notion is second preimage resistance where the target input 
z is randomly chosen (rather than by A). It is however well known how to convert 
a second preimage resistant function family into a UOWHF. 

The security of these constructions is proven by reductions: given an adver- 
sary A that wins the above UOWHF game, we build an “inverter” I that is able 
to solve a computationally hard problem, e.g. invert a one-way function. A cru- 
cial feature of these reductions is their efficiency, i.e. the relationship between 
the running time of D (or A) and I, and the resulting degradation in the secu- 
rity parameters. For the case of UOWHFs one of the most important efficiency 
measures is the size of the key needed to run the algorithm. 

Unfortunately the construction of UOWHFs based on general one-way func- 
tions do not fare very well on that front. If n is the security parameter, the 
original Rompel construction yielded a key of size 0(n 12 ) which was later im- 
proved to 0(n 7 ) by Haitner et al. in jHj. Conversely under the much stronger 
assumption of one-way permutations Naor and Yung in m achieve linear key 
size. Apart from the above works, we are aware of only one work by De Santis 
and Yung |2j that constructs UOWHFs from regular one-way functions (i.e. func- 
tions that have constant size preimages). Their construction achieves 0(n log n) 
key size but is very complicated and more importantly requires knowledge of the 
regularity parameter. 

We go back to investigating the construction of UOWHFs from regular one- 
way functions. We obtain a very simple construction with 0(n log n) key size, 
which does not require knowledge of the regularity parameter. These are the 
first such efficient constructions of UOWHF from regular one-way functions of 
unknown regularity. 

Somewhat surprisingly our UOWHF construction is obtained via a simple 
” tweak” on a well-known algorithm for pseudo-random number generation from 
regular one-way functions: the Randomized Rerate m- Another surprising con- 
nection established by this paper is that Shoup’s domain extension technique 
can be used to improve the seed size in both the PRG and UOWHF. 

Motivation. Collision resistant hashing is an ubiquitous tool in Cryptography 
and in practice a stronger notion of collision resistance is used where the adver- 
sary is given as input just H gH and must find z, z' that collide (we will refer to 
this notion as full collision resistance as opposed to the target collision-resistance 
property enjoyed by UOWHFs). 

This is problematic because there is strong evidence that this stronger notion 
cannot be achieved by assuming just OWFs. Simon JEj proves that there is no 
black-box constructiorQ of a fully collision resistant hash function from one-way 
permutations. While a non black-box construction based on OWFs remains theo- 
retically possible, such construction would probably be very inefficient, since 
efficient constructions based on general assumptions seem to be black-box 
ones. 


1 Informally, a black-box construction accesses the underlying OWF only via input 
queries, without any knowledge of its internal structure. 
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Furthermore the cryptanalysis of practical and widely adopted supposedly 
collision-resistant functions have reminded us of the importance of construct- 
ing efficient candidates for collision-resistant functions which are also provably 
secure, i.e. have a security reduction to a well established computational hard 
problem. The above explains why researchers and practitioners alike are looking 
at UOWHFs to replace full collision resistant hashing in practical applications 
(such as certifications - see for example the work of Halevi and Krawczyk on 
randomized hashing HDD- 

Current efficient candidates for UOWHFs have either no proof of security 
or make stronger assumptions than the existence of OWPfl Achieving a truly 
efficient UOWHF construction based on OWFs would offer practitioners a target 
collision-resistant function which can be used in practice and gives the peace of 
mind of a strong security guarantee. 

In order to achieve this goal, our construction slightly relaxes the assumption 
to regular OWFs, yielding a dramatic improvement to a O (n log n) key size. We 
are following the same approach as 0 for pseudo-random generators: looking at 
the more limited case of regular OWFs not only to improve the efficiency, but 
also to explore techniques that might benefit constructions in the general case 
(which is what happened in the PRG case) . 


1.1 Our Contribution 

We present a new algorithm (we call it the Generalized Randomized Iterate 
GRI ) which depending on its parameters can be used to build either PRGs 
or UOWHFs starting from regular one-way functions. 

First proposed in 0 the original Randomized Iterate construction involves 
composing the regular one-way function with different n-wise (later improved to 
simply pair-wise independent in 0) universal hash functions at each iteration. 
More specifically if / is a regular one-way function, and hi , . . . , h m are pairwise 
independent hash functions all from {0, 1}" to (0, 1}", the m th randomized iter- 
ate of / using the hi is defined as f k = f o o / o hk-i o . . . / o hi o /. In |4I7| it 
is shown that this function is hard to invert at each stage and therefore can be 
used to construct PRGs in conjunction with a generic hard-core predicate (such 
as the Goldreich-Levin bit 0 ). 

We generalize the Randomized Iterate to use compressing pair-wise indepen- 
dent hash functions hi at each stage. Somewhat surprisingly we then show that 
the resulting family (see Definition 0) is second-preimage resistant. 

Notice that in the above applications the universal hash functions hi are part 
of the secret key of the resulting algorithm (the seed for the PRG, the index key 
for the UOWHF). Therefore it is desirable to have constructions in which the 
number of functions can be minimized. 

2 For example Halevi and Krawczyk in 0J propose a mode of operation for typical 
hash function such as SHA-1 that creates a UOWHF under an assumption on the 
compression function which is seemingly stronger than OWF, but somewhat weaker 
than full collision resistance. 
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The Randomized Iterate PRG construction in [Zj has an 0(n 2 ) seed, but 
it was also shown how an 0(n log n) seed could be achieved by using generic 
de-randomization techniques. First we point out that this approach does not 
immediately work in the UOWHF case, as in order to reduce the key size, the 
de-randomization procedure requires an additional propertjQ. 

We then explore another fascinating and somewhat unexpected connection. 
We observe that instead of using de-randomization techniques, the structure of 
the Generalized Randomized Iterate can be improved by using Shoup’s domain 
extension technique for UOWHFs US- We define the Reusable Generalized Ran- 
domized Iterate RGRI : Using Shoup’s approach we prove that it is possible to 
’’recycle” some of the hash functions in the Generalized Randomized Iterate, to 
O(logm) for m iterations (instead of m). The net result is that we achieve a 
UOWHF with 0(n log n) key size. 

Finally we point out that the RGRI also yields an 0(n log n)-seed PRG from 
regular one-way function, and can be also used for hardness amplification of reg- 
ular one-way functions, obtaining alternative proofs of results already appearing 


1.2 Comparison with Previous Work 

We already mention the previous works on UOWHFs based on general assump- 
tions [121131812] and how they compare to our work. 

As discussed above our UOWHF construction uses in a crucial way tools that 
were developed for the task of pseudo-random generation. In this sense our work 
follows the path of recent papers on inaccessible entropy [918 j . Those beauti- 
ful works elegantly show that the known constructions of PRGs and UOWHFs 
can be interpreted as similar manipulation techniques on different forms of 
computational entropy (pseudo-entropy for PRGs and inaccessible entropy for 
UOWHFs). While less general, our work shows a more direct and specific con- 
nection: a single algorithm (the Generalized Randomized Iterate) which is suffi- 
ciently ’’flexible” to be used either as a PRG or as a UOWHF. 

1.3 Paper Organization 

We briefly recall the relevant definitions in Section |21 In Section 0 we introduce 
the Generalized Randomized Iterate and its Reusable variant; we also prove 
a main technical Lemma that is at the heart of the efficiency claim for our 
UOWHF construction which appears in Section 0 We present our alternative 
constructions of a 0(n log n)-seed PRG, and the hardness amplification result in 
Section 0 (the proofs of these constructions will appear in the full- version) . We 
conclude with some discussions and open problems in Section El 


The actual de-randomization algorithm (the Nisan-Zuckerman PRG for space- 
bounded computations) used in 0 has this property, but a generic PRG for space- 
bounded computation might not. 
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2 Preliminaries and Definitions 

2.1 One-Way Functions 

Definition 1. Let f : {0, 1}* — > {0, 1}* be a polynomial-time computable func- 
tion. f is one-way if for every PPT machine A, there exists a negligible function 
!/(■) such that 

Pv[x 1- {0, 1}"; y = f(x) : A{ 1”, y ) e / -1 (/(*))] < v{n) 

Definition 2 (Regular One-Way Functions). Let f : {0, 1}* —1 (0, 1}* be a 

one-way function, f is regular if there exists a function a : N — >• N such that for 
every n € N and every x e {0, 1}" we have: 

l/ _1 (/(^))l = a(n) 

We assume that the regularity a(-) of a function / is not known (i.e. not poly- 
nomial time computable). Without loss of generality, we assume the one-way 
function is length preserving i.e. /({ 0, 1}”) C {0, 1}". 

2.2 Hardcore Predicates 

Definition 3. Let f : (0, 1}” -l (0, 1}* and b : (0, 1}" -> {0, 1} be polynomial- 
time computable functions. We say b is a hardcore predicate of f, if for every 
PPT machine A, there exists a negligible function v{-) such that 

Pr[® «- {0, 1}"; y = f{x) : A(l n ,y) = b(x)} <^ + o(n) 

If / is a one-way function over {0, 1}" then Goldreich and Levin in jS] prove that 
the one-way function /' over {0, l} 2 " defined as f'(x,r) = (f(x),r) admits the 
following hard-core predicate b(x, r) =< x, r >= Sx,Xi m od 2 where Xi, ri is the 
i th bit of x, r respectively. In the following we refer to this predicate as the GL 
bit of /. 


2.3 Pseudorandom Generators 

Definition 4. Let G : (0, 1}" — > {0, 1}*!") be a polynomial time computable 
function where l(n) > n. We say G is a pseudorandom generator, if for every 
PPT machine A, there exists a negligible function v{n ) such that 

| Pr[z «- (0, 1}"; y 1- G{ x) : A{ 1", y) = 1] 

— Pr[a: <- (0, : A(l n ,y) = 1]! < v(n) 
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2.4 Universal One-Way Hash Function Families 

Definition 5. Let Q = {gfakefc a family of functions where each function 
gk goes from {0, 1}"+^ to {0,1}”. We say that Q is a a Universal One-Way 
Hash Function Family if (i) the functions gk are efficiently computable and (ii) 
for every efficient adversary A , the probability that A succeeds in the following 
game is negligible in n: 

- Let ( x , a) <- A{ 1”) 

- Choose k <— K. 

- Let x' G- A(a, k) 

- A succeeds if x^x' and gfax) ^ gfax') 

Universal One-Way Hash Function Families M as defined above enjoy the prop- 
erty of target collision-resistance. Next, we define the seemingly weaker notion 
of Second Preimage Resistance where the adversary cannot find a collision for 
randomly chosen input and key. It is well-known how to construct UOWHFs 
from second preimage resistant families. 

Definition 6 (Second Preimage Resistance). Let Q = {gfakeK. a family 
of functions where each function gk goes from {0, 1}"+^ to (0, 1}". We say that 
Q is a a Second Preimage Resistant Hash Function Family if (i) the functions 
gk are efficiently computable and (ii) for every efficient adversary A , then the 
following probability 

Pr[z <- {0, l} n+i ; k <- 1C ; A(z, k) = z' : z + z' and g k {z) = gk(z')\ 
is negligible in n. 


2.5 Universal Hash Function Families 

Definition 7. Let H be a family of functions where each function h £ H goes 
from {0, 1}”+^ to {0,1}”. We say that H is a an efficient family of pairwise 
independent hash functions if (i) the functions h £ H can be described with a 
polynomial (inn) number of bits; (ii) there is a polynomial (inn) time algorithm 
to compute h G H; (in) for all x/i'e {0, 1}”+^ and for all y,y' G {0, 1}” 

PrhenlHx) = V and fax') = y'] = 2 -2 " 

3 The Generalized Randomized Iterate 

A well known fact about one-way functions is that if you iterate them, you may 
not end up with a function that is difficult to invert. Indeed while a permutation 
/, when iterated /W = f o . . . o f (i.e. / composed with itself i times) remains 
one-way, this is not true for general one-way functions as a single application 
could concentrate the outputs on a very small fraction of the inputs of /, where 
f might even be easy to invert. 
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Goldreich, Krawczyk and Luby in 0 introduced the Randomized Rerate con- 
struction where a randomization step is added between two application of /, 
in its iteration. As shown in jZj when using pair-wise independent hashing to 
implement this randomization step, the Randomized Iterate is hard to invert. 

We introduce the Generalized Randomized Iterate (GRI) and we show how 
it can be used to construct both pseudo-random generators and target collision- 
resistant hashing. We then show a randomness efficient form of the (generalized) 
Randomized Iterate, where some of the hash functions are “recycled” during 
the iteration. This Reusable Generalized Randomized Rerate is the core of our 
efficient construction of UOWHFs. 

Definition 8. Let f : (0, 1}" —i {0, 1}" and let H be an efficient family of 
pairwise-independent hash functions from {0, 1}" + ^ to {0,1}". For input x G 
{0,1}", ^ G {0,1 } ik , hi,...,h m G H and m > k, define the k th Generalized 
Randomized Rerate g k : {0, 1}" X {0, l} ek x W 71 — >■ {0, 1}" recursively as: 

g k (x, z, hi , . . . , h m ) = hkifig^fx, z,h i, . . . , h m ))\\z [{k _ 1)£+1 „ M] ) 

where g°(x, z, hi , . . . , h m ) = x, || denotes concatenation and Z[ 0 ...b] is the sub- 
string of z from position a to position b. 

In other words at each iteration of the Generalized Randomized Iterate, first f 
is applied to the output of the previous iteration, then a block of l bits from z 
are appended to the output, and then a pair-wise independent hash function is 
applied. Note that at each iteration a new hash function is used. 

While we are defining GRI for any value of Z, we are going to be interested to 
two cases: 

— I = 0 in which case 2 is the empty string, and the pair-wise independent hash 
functions map n bits to n bits. This case is equivalent to the Randomized 
Iterate from EE and as shown there it can be used to build PRGs; 

— Z = 1 in which case 2 is fc-bits long, and the hash functions compress one bit. 
We will show in Section 0 that this function is a second preimage resistant 
function (from which a UOWHF can be easily built). 


3.1 The Reusable Generalized Randomized Iterate 

We now introduce the Reusable Generalized Randomized Rerate (RGRI) which is 
a version of the Randomized Generalized Iterate that uses fewer hash functions. 
While the GRI described in the previous Section use new distinct hash functions 
at each iteration, we “recycle” some of this hash functions during the process. 
More specifically we sample m hash functions hi, , h rn from 7i and then in 
the i th iteration of the RGRI we use the function h${$ where <i>{i) is the function 
that on input i, outputs the highest power of 2 that divides i. It is not hard 
to see that if we have k iterations it is sufficient to set m = [ log fc] + 1. This 
“scheduling” of the hash functions is identical to the way Shoup recycles random 
masks in his construction of a domain extender for TCR functions 


The Generalized Randomized Iterate and Its Application 


161 


Definition 9. Let f : {0, 1}" — >• {0, 1}" and let % he an efficient family of 
pairwise-independent hash functions from {0, lj n+e to {0,1}". For input x £ 
{0, 1}", z £ {0, l} ek , hi, . . . ,h m £ TL and m > [log Af|+1, define the k th Reusable 
Generalized Randomized Iterate g k : {0, l}"x{0, > {0, 1}" recursively 


g k (x, z, hi , . . . , hm) 


h^k)(f{g k 1 {x, Z,hl, . . . ,hrn))\\Z[(k-X)l + l...ki]) k> 0 
* otherwise 


where (j){n) is one greater than the highest power of 2 that divides n. 

3.2 A Technical Lemma 

We now prove a preliminary Lemma which is crucial in allowing us to achieve 
logarithmic key size for our UOWHF construction. This Lemma abstracts the 
property of the ’’Shoup domain extension” technique we use to construct the 
RGRI : intuitively the Lemma proves a preliminary result that will allows us 
later to claim that the distribution induced by the RGRI is not that far from the 
distribution induced by the GRI with distinct (i.e. non-reused hash functions). 

The goal of the Lemma is to count how many input pairs lead to two specific 
values ao,ai as outputs of the RGRI. 

Lemma 1. Fix two arbitrary values do, aq £ (0, 1}" and an integer i. The num- 
ber of pairs [(a;o, Zo,hi,. . h m ), (aq, zi, hi, . . . , h m )\ such that 

g l (xo,zo,hi, . . . ,h m ) = ao and g i (xi,zi,hi, . . . ,h m ) = ai 
is bounded by 2 m ■ 

Note that in the Lemma we are counting the pairs with possibly distinct inputs 
x, z but same hash functions hi. 

Proof: To prove the Lemma we use a “key-reconstruction” strategy introduced 
by Shoup in US- The algorithm in Figure |T| on input i £ [0..fc], Zo , Z\ £ {0, l\ ik 
and ao,ai £ {0, 1}" generates a pair of inputs (a:o , h) and (aq, h) such that the 
output of the i th iterate is ao and ai , i.e. 

g z (xo,zo,hi, . . . ,h m ) = ao and g l (xi,zi, hi , . . . , h m ) = a\ 

We prove that this algorithm outputs all possible input pairs (a:o, h) and (aq, h) 
with some probability. To complete the proof of the claim we show that the total 
number of distinct outputs by the algorithm is [H\ rn (the Lemma follows since 
there are 2 2fk possible values of zq ,z\). 

The high-level idea of the Shoup reconstruction strategy described in Figure 
|T| is the following. Consider the simple case of the randomized iterate func- 
tion g k (where a different hash function is used after each iterate). Since, we 
use different hash functions at every iterate, we choose all the hash functions 


162 S. Ames, R. Gennaro, and M. Venkitasubramaniam 


hi, , hi- 1 , hi + 1 , . . . , h m arbitrarily, except the one in the i th iterate (i.e. hi). 
Using xo, zo, hi , . . . , /ij_i and x\,Z\,h\, . . . , /ij_i we compute yo, yi as the out- 
puts of / o g l ~ 1 . We then choose hi so that hi(yo| |zo,7,-i)£+i...a]) = cio and 
hi(yi\\z lt {[i-i)ij t -i...ii]) = ai simultaneously holds. This is possible since H is a 
pairwise-independent family. Furthermore, the number of such functions hi is 
equal to \H\/2 2n . Observe that, every input pair satisfying the conditions is out- 
put by the strategy for some random choices and every random choice yields 
different outputs satisfying the conditions. Therefore, the total number of pairs 
satisfying the conditions equals the total number of random choices made by the 
strategy and that is 2 2n 2 2h [H\ m ~ 1 x \'H\/2 2n = 2 2tl \'H\ m . 

However this procedure does not work for the reusable randomized iterate 
since the hash functions are recycled. Instead, we consider segments and perform 
a ” right to left” sweep from the i th iterate to the first iterate, ensuring that each 
segment is locally consistent. More precisely, in each segment, for a particular a, 
the algorithm selects hash functions and string x such that if x is fed as input 
to the j th iterate, then the output of the computation at the i th iterate ( i > j ) 
is a. For the segments to compose, we need to ensure that the hash functions 
selected by different segments do no conflict with each other and that is the 
technical part of the proof. To extend the algorithm to achieve consistency for 
two inputs it suffices to observe that for all Xo ^ xi and arbitrary values ao,ai, 
there exists an h such that h(x o) = do and h(xi) = ai. The formal description 
of the algorithm is presented in Figure [I] 

First, we prove correctness and then compute the number of colliding pairs. 
Sub-Claim 1. If the algorithm in Figured outputs ( xq , h), (xi,h), then it holds 
that g i (x o, zq, h) = ao and g i {x\,zi, h) = a\. 

Proof: Every iteration of the algorithm, considers the segment from the j th 
iterate to the i th iterate and achieves the following: if x° 0 (and x\) is fixed as the 
partial input to the j th iterate then ao (and a±) is the output of the i th iterate. 
This follows from the fact that, h^ is assigned a value at step 2(d) after knowing 
what the output of the i — 1 st iterate is computed. It only remains to show that 
two iterations do not assign values to the same hash function. The algorithm 
assigns value to a hash function in steps 2(b), 2(d) and 4. By construction step 
2(b) and 4 only assign values to hash functions that have not been defined yet 
(indicated by the flag being false). It suffices to ensure that there are no conflicts 
in the assignment made at step 2(d). This is ensured by maintaining the invariant 
that h^ is undefined before executing 2(d) in any iteration. Observe that, in 
every iteration, (j>(j ) > (f>(i) and for all c such that j < c< i, (j){c ) < 4>{i). Hence, 
before step 2(d) is reached in any iteration, the only hash-functions that are 
defined are those with indices c such that 0(c) < 0(j). □ 

Sub-Claim 2. The number of distinct pairs output by the Shoup Reconstruction 
algorithm is bounded by 

Proof: From Sub-Claim [I] we know that every pair output of the algorithm 
satisfies the condition that ao and ai are the output of the i th iterate. Further- 
more, every pair that satisfies the condition occurs as an output for some choice 
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Input: i,zo,zi,ao,ai 

1. Set Flags Fo, . . . , F m _i to false II Flags indicate which hash-functions 

2. while i ^ 0 

(a) j < — (i—2^^) II The new condition will be at position j 

(b) Randomly choose x 3 0 , x\ from {0, l} n . For all j < o < i, if F , ^,( c ) = false , 
randomly choose h^,( c ) from TL and set F^ c ) < — true. 

(c) Compute 

x j f ) 11*0, [jl+l... (3 + 1)1] ; fe »(3 + l) ; / ; _ _ _ } y o 

j-i I ; l|Zl .[^+i -b+i)h ) h *U+ 1) ; f ) _ _ _ h <K*~ 1) ) / ) ^ 

(d) Randomly choose h 6 TL conditioned on 

h{yo\\z OAi i-x)i +1 ... u] ) = <z 0 and 

Set h^(j) •< — h, F^(iy < — true. 

(e) * < — j, a 0 < — x 3 0 , ai < — x{ 

3. endwhile 

4. For all c, if Fm c ) = false, pick h^ c ) uniformly from R and set F^c) to true. 

5. output (*o = xl, hi, ... , hm), (au = x\,hi, ..., h m ) 


Fig. 1. Shoup Reconstruction Algorithm 


made by the algorithm and each choice made by the algorithm yields distinct 
outputs. Therefore, it suffices to compute the total number of choices made by 
the algorithm. To compute the number of pairs, observe that, for every choice 
made for xP Q and x\ (such that xP Q ^ x{) in step (b), the number of hash functions 
h such that 

h(yo\\zo,[(i-i)e+i...M]) = ao and h(y 1 \\z h[ ( i _i) e+1 ... ie] ) = ax 

is ^J-, by the pairwise independence property. We treat the choices made for x 3 0 , 
x{ as a choice made for h^u) se t in step 2(d). Thus, the number of choices for 
the hash function in step 2(d) is at most 2 2 " x = \TL\. The only other choices 
are the hash functions picked in step 2(b) and 4. Since they can take any value, 
they have \TL\ many choices. Hence, corresponding to every hash function the 
algorithm makes \TL\ many choices. Thus, the total number of pairs is bounded 
by | H\ m . □ 

This concludes the proof of Lemma UJ □ 

The following Corollary is proven by using the same counting argument and the 
same ’’reconstruction strategy” of Lemma [I] (intuitively, the bound results from 
the fact that you can choose x in 2” ways, 2 in 2 lk ways, m — 1 hash functions 


164 S. Ames, R. Gennaro, and M. Venkitasubramaniam 


uniformly at random in H, and the hash function h t via pairwise independence 
among [H\/2 2n possible candidates). 

Corollary 1. Fix arbitrary values ao, a\ £ {0, l} n , y £ {0, iy n+e and, an integer 
i. The number of inputs (x, z, hi , ... , h m ) such that 

g i (x, z, hi , . . . , h m ) = ao and hi{y) = a i 

is bounded by 2 tk ~ n ■ \'H\ m . Moreover there exists a polynomial time algorithm 
that samples such an input uniformly at random. 

Remark: We point out that the ’’reconstruction” property outlined in Lemma 
[H is exactly what is needed in order to prove the security of our UOWHF with 
0(ri log n) key based on the RGRI. 

This is in contrast to the case of PRG [ZJ where any PRG for space-bounded 
computation would work to ” de-randomize” the seed from n 2 to nlogn. We 
can show that the particular space-bounded PRG used in jZj satisfies a Lemma 
similar to Lemma d and therefore could be used to reduce the size of the key 
of our UOWHF. For simplicity we just show the construction based on Shoup’s 
technique. 


4 Constructions of Universal One-Way Hash Functions 

In this section, we show how to construct second preimage resistant functions 
from regular one-way functions. We start with a simple construction (that al- 
ready improves the efficiency from previous work) of quadratic key size. We 
then provide a more efficient and essentially optimal solution with 0(n log n) 
key size. Note that our functions compress a single bit (higher compression can 
be achieved by standard modes of iteration). Note also that UOWHFs can be 
easily built from second preimage resistant families. 


4.1 A Construction with Linear Key Size 

Definition 10. Let f : {0, 1}” — > {0, 1}” and let JC = (0, 1}” x H n+1 where % 
is an efficient family of pairwise-independent hash functions from (0, l} n+1 to 
{0, 1}". Define the function g(z , k) with input space z £ {0, 1}" +1 and key-space 
k = (x, hi, . . . , h n+ 1 ) £ K. as follows: 

g(z, (x, hi,..., h n+ 1 )) = g n+1 (x, z,hi,..., h n+1 ) 

where g % is the Generalized Randomized Rerate with l = 1 . 

Theorem 1. Suppose f is a 2 r -regular one-way function. Then g defined ac- 
cording to Definition E3 is a second preimage resistant function family. 
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Proof Overview: To understand how our construction works, let us assume (as 
a simplifying assumption) that we can uniformly sample pairs (01,02) such that 
/(a 1) = f (02). Let us refer to such pairs as siblings for /. 

Given such a pair it is possible to set up the hash functions in the above 
construction so that if the adversary finds a collision, then we invert the one- 
way function on a point y. Intuitively this is done as follows: given a random 
input z for the UOWHF, we choose the hash functions (i.e. the key k) so that 
g l (z,k) = ai and hi(y\\b) = 02 for a random index i and a random bit b. We 
then run the adversary on z,k and if the adversary finds a collision z', with non- 
negligible probability the collision ’’goes through” a 2 at index i, i.e. g z (z', k) = 02 
allowing us to find a preimage of y. 

The intuition here is that given any input z, and key k, at each iterate the in- 
put going into the one-way function has most 2 r collisions w.r.t /. For a collision 
to occur at a particular iterate, it must be the case that some range element y of 
the one-way function / must occur at the previous iterate and the hash function 
takes y and an input bit into one of the 2 r collisions in the next iterate. Since 
there are at most 2 n ~ r range elements, in expectation over hash functions, the 
number of possible inputs at the previous iterate that are mapped into the 2 r 
collisions are small, in fact 0(1). Thus the hash functions selected above will 
succeed with high probability. 

But how do we get to sample ai,a2, i.e. siblings for / in the first place? For 
this we use the adversary again. Indeed when an adversary finds a collision to 
input z (say z'), it must be that at some iterate, the inputs into the intermediate 
hash functions are different and the outputs to the next iterate are strings 01 and 
a2 such that /(01) = f(a 2), i.e. siblings for /. It remains to argue that sampling 
01 and 02 by first querying the adversary is good enough, and this is established 
using a collision-probability- type analysis. We now proceed to a formal proof. 
Proof: Assume for contradiction, there exists an adversary A and polynomial 
p(-) such that for infinitely many lengths n, the probability with which A finds 
a collision on a random input z £ {0, 1}" and key k = (x,h) £ K is at least 
f > ^L-. We assume for simplicity that A is deterministic. Fix a particular n 
for which this happens. Using A, we construct a machine M that inverts / with 
probability that is polynomially related to e and thus arrive at a contradiction. 

The machine M on input y £ {0, 1}” internally incorporates the code of A 
and proceeds as follows: 

1. Sample a random input z and key k = (x,h). Internally run A on input 
(z, k). If A fails to return a collision, halt outputting _L. Otherwise, let z’ be 
the output of A. 

2. Let i be the smallest index such that f{g’ l ~ 1 {z,k))\\zi ^ f(g z ~ 1 (z',k))\\z' i 
and f(g z (z, k)) = f(g z (z', k)) (since g(z, k ) = g(z', k) such an i must exists). 
Let 01 = g l {z,k) and 02 = g z (z',k). It follows now that f{ai) = f(a 2). For 
any two colliding inputs such as z and zj with key k, we call this i the 

colliding-index. 

3. Choose z*,k* = (x*,h*, . . . , /i* +1 ) and a random bit b such that g l (z* ,k*) = 
ai and h* (y\\b) = 02 ■ This can be done using the pairwise independence 
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property of H. More precisely, choose z* ,x* and all the hash functions except 
h* at random and set h* so that both the conditions hold. Run A on input 
(z*, k*). If A fails to return a collision or such a hash function h t can not be 
sampled^ halt outputting _L. Otherwise, let z" be the output of A. 

4. If f{g‘ l ~ 1 {z",k*)) ^y, halt outputting _L. Otherwise, output g l ~ 1 (z" ,k*). 

It follows from the construction that if M outputs w, then f(w) = y. We 
now proceed to compute the success probability of M. But first, we require 
the following definition. Define sets N{i,a\,a2) to contain all input-key pairs 
(z,k) such that the following hold true: f{a\) = /(a2) and g l (z,k) = oi, and 
A on input (z, k) returns z' such that g l (z' . k ) = 02 and i is the colliding-index. 
We first express the success probability of M using these sets. 

Claim 1. The probability with which M succeeds in inverting f is 

2„+ r -i |iV(f, Qi, a 2 )| 2 
<t “ a {2 2n + 1 \H\ n + 1 f 

Proof: Given a tuple (z*,k*,i,a 1,02) such that ( z*,k *) G N(i,a 1,012), define 
the following events: 

Event El: The randomly chosen input-key pair (z, k) by M in Step 1 is in 
N(i,a i,a 2 ). 

Since the input and key are chosen uniformly at random, it holds that 

Pr [Si] = l/2 n + 1 xl/2 n xl/\n\ n+1 x\N(i,a 1 ,a 2 )\ = \N(i, a 2 )\/2 2n + 1 \H\ n + 1 

Event E2: If A on input ( z*,k *) returns z' — where k* = (x, hi , . . . , h n ) — 
this event denotes that M’s random choice b = z[ and M’s input is y such 
that g l ~ 1 (z', k*) = y. Therefore, hi(y\\b) = hi(y\\zl) = 02- 
The probability that b = z\ is 1/2. Therefore, since / is a 2 r -regular OWF, 

I ’r 7T 2 J - 1 /2 • 2 r /2 n = 2 r_1 /2". 

Event E3: M chooses z*,k* in Step 3. 

From the pairwise- independence property of H, it follows tha10 Pr[I?3] = 

i/(2 2n+lj ^^) = i/2\n\ n+1 

It follows from the description that for any tuple (z*,k*,i,a 1,02) such that 
(, z*,k *) G N(i, a\. (12), if E 1: E 2 and E 3 occurs, M inverts y. Note that Ex, E 2 
and E 3 are independent. Therefore, for a fixed tuple ( z*,k*,i , 01, 02) such that 
(, z*,k *) G N(i, ai, 02) the probability that Ex,E 2 and E 3 occurs is 

\N(i,a 1 ,a2)\ 2r~\ 1 

2 2n+l\ n \n+l 2 n 2\'H\ n + 1 

4 This occurs when ai 02 and }{g l ~ 1 {z * , k*)) = y and Zi = b. 

5 z, x and all the hash functions except hi are randomly chosen. There are 2 2n+1 |R| m_1 
such tuples, hi is chosen so that two of its values are fixed. Since H is a pairwise- 
independent family of hash functions, there are exactly such functions. Finally, 
one of these tuples are chosen uniformly at random. 
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It follows from the definition of the sets N(-, •, •), that for every (z. k) there exists 
at most one tuple (i, ai, a 2 ) such that (z, k) G N(i, a\, a 2 ). Therefore, the success 
probability of M can be expressed as the sum of the success probability of M 
on each tuple (z*,k*,i, 01,02) such that ( z*,k *) G N(i, 01,02)- More precisely, 
the success probability of M is, 

a.i, a,2)| 2 r ~ 1 1 

^ ( Z .,k*)mi, ait a 2 ) 2 2n+1 m n+1 X X W+ 1 

y |iV(*, ai, U2)| 2 w 2 r ~ 1 1 

~ ia ? a2 2 2 "+ 1 | H |"+ 1 X ~ 2 r ~ X 2 | W |»+ 1 
_ 2 n +r- 1 |lV(i, ai, a 2 )| 2 

iila 2 (2 2n+1 |'7f | n+1 ) 2 


□ 

We now relate this expression to the success probability of A. 

Claim 2 . If A succeeds with probability e then l^(b aii a2 )l > e 

(ta^a 2) ( 22n+1 |^l” +1 ) - n2 " +r 

Proof: Since for every pair (z, k), there exists at most one tuple (i, 01,02) such 
that (z, k) G N(i, 01,02) and by definition if (z, k) G N(i, 01,02) then A succeeds 
on input (z, k), we have that the success probability of A is 


l/( 2 2 n + l| wr+ l)x £ \N(i,0i,02)\=e 


Let us consider the sum in the left-hand side and use the Cauchy-Schwartz 
inequality to obtain a bound on the sum of the squares of each term. It suffices 
to consider the sum over all tuples ( i , 01,02) such that N(i, ai, 02) is not empty. 
In particular, they are not empty only if /(a 1) = f(a 2 ). Therefore, the total 
number of such tuples is at most n2” +r . Using the Cauchy-Schwartz inequality, 
we have that 


y- |iV(f,ai,a 2 )| 2 ^ e 2 

(i,a^a 2 ) (2 2 ” +1 |^l" +1 ) 2 “ n2U+r 


□ 

Now, we conclude the proof of the theorem. Applying Claim [5] to Claim Q we 
obtain that the success probability of M is at least 2 n+r_1 x n f n+r = which 
is non-negligible. Therefore, M inverts / with non-negligible probability and we 
arrive at a contradiction. □ 


4.2 A Construction with Logarithmic Key Size 

We now show how to construct a more efficient second preimage resistant family 
from regular one-way functions, by showing that if / is a regular OWF then the 
Reusable Generalized Randomized Iterate is second preimage resistant. 
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Definition 11. Let f : {0,1}” — l {0,1}” and let K. = {0,1}” x H m where 
LL is an efficient family of pairwise-independent hash functions from {0, 1}” +1 
to {0,1}” and m = O(logn). Define the function g(z,k) with input space z G 
{0, 1}” +1 and key-space k = (x, hi , . . . , h m ) G 1C as follows: 

g(z, (x,hi,..., h m )) = g n+1 (x, z,hi,..., h m ) 

where g l is the Reusable Generalized Randomized Rerate with 1=1. 

Theorem 2. Suppose f is a 2 r -regular one-way function. Then g defined ac- 
cording to Definition El is a, second preimage resistant function family. 

Proof: Assume for contradiction, there exists an adversary A and polynomial 
p(-) such that for infinitely many lengths n. the probability with which A finds 
a collision on a random input z G {0, 1}" and key k = (x, h) G 1C is e > ^y. As 
before, we assume for simplicity that A is deterministic. 

Fix a particular n for which this happens. Using A, we construct a machine M 
that inverts / with probability that is polynomially related to e and thus arrive 
at a contradiction. The machine M on input y G {0, 1}” internally incorporates 
the code of A and proceeds as follows: 

1. Sample a random input z and key k = (x, h). Internally run A on input 
(z, k). If A fails to return a collision, halt outputting _L. Otherwise, let z' be 
the output of A. 

2. Let i be the colliding-index. Let ai = g z (z,k) and 02 = g l (z',k ) 

3. Choose z*, k* = (z*, /i{, . . . , hfff) and a random bit b such that g l (z*,k*) = a\ 
and (y||6) = <22 ■ This can be done in polynomial time following Corollary 
[T] Internally run A on input ( z*,k *). If A fails to return a collision, halt 
outputting _L. Otherwise, let z" be A’s output. 

4. If / {z " , k*)) y , halt outputting _L. Otherwise, output g l ~ 1 (z", k*). 

As before, we define sets N{i,a\,af) that satisfy the same condition with the 
exception that we rely on g l instead of g l . The next claim relates these sets to 
the success probability of M. 

Claim 3. The probability with which M succeeds in inverting the one-way func- 
tion f is 2" +r_1 l-Nfyi «i, a i)\ 2 / (2 2n+1 |'H| m ) 2 


Proof: Consider the events E1,E2 and E3 exactly as before. We now have that 
given a tuple (z, k, i, ai, 012), 

— Probability that E\ occurs is l/2” +1 x 1/2" x l/\hL\ m x \N(i, ai, 02)! = 
\N(i, ai ,a 2 )\/2 2n+1 \H\ m . 

— Probability that £2 occurs is 2 r_1 /2” as before. 

— Probability that £3 occurs given £1 and £2 occurs is l/(2 2n+1 ^J„ ) = 
l/2\'H\ m . This follows from Corollary [I] for £ = 1 and k = n + 1 (which 
are the parameters used in this construction). 
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Again, we have that every (z. k) belongs to at most one set N(i. ai, a?)- There- 
fore, the success probability of M is 

\ ^ 1 N(i, ai, 02)! 2 r ~ 1 1 _ +r _ 1 |AT(i, ax, a2)| 2 

^ lz , k) ^ auaa) 22n+1 l^l m X 271 X W ~ (2 

□ 

The next claim follows identically to Claim |3 

Claim 4. If A succeeds with probability e then Yl(i, ai ,a 2 ) — n 2 n +^ 

As before applying Claim [3 to Claim 01 we obtain that the success probability 
of M is at least and thus we arrive at a contradiction. □ 

5 PRG Construction and Hardness Amplification 

The idea of iterating a one-way permutation / on itself to obtain a PRG origi- 
nates from the work of Blum, Micali and Yao DEI- Since / is a permutation, 
the function /W — / o . . . o / {/ iterated on itself i times) is also one-way. This 
means that the hardcore bit of every intermediate step is unpredictable. Iterat- 
ing n + 1 times on a random input of length n and outputting all the hardcore 
bits would then yield a PRG that stretches by 1 bit. We refer to this as the BMY 
constructior@. 

This approach, unfortunately does not work for general one-way functions. For 
the special case of regular one-way functions, Goldreich, Krawczyk and Luby 0, 
showed how to extend the BMY construction by adding a randomization step 
using an n-wise independent hash-function between every two applications of 
/. Haitner, et. al [Zj simplified the construction to use just pair-wise hashing 
and further derandomized the construction by showing how to generate the n 
hash- functions required at the randomization steps using just nlogn bits thus 
obtaining a PRG of seed length 0(n log n). 

In Haitner et. al, showed that the same randomized iterate can also be used 
for hardness amplification to obtain strong one-way function from any regular 
weakly one-way function with unknown regularity. They also showed that similar 
derandomization yielded corresponding efficiency gains. 

Using the Reusable Generalized Randomized Iterate, we obtain analogous 
PRG constructions and hardness-amplification with same efficiency. More pre- 
cisely, we obtain the following results. 

Theorem 3. Let f : {0, 1}" — >• {0, 1}" be a regular one-way function and H 
be an efficient family of pairwise-independent length preserving hash functions. 
Define G : {0, l} 2 " {0, l} 2n+1 as 

G(x, r, h) = ( b(f°(x , h), r), . . . , b(f n (x, h), r), r, h ) 

6 If / is a permutation over n-bit strings a more efficient construction is to set the 
generator G as G(x) = f(x).b(x). However this uses in a crucial way the property 
that f is a permutation (since if x is uniform then f(x) is also uniform). 
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where f k (x, hi , . . . , h m ) = f(g k (x, hi , . . . , h m )) and g k is the RGRI defined by 
x,hi,..., h m with t = 0 and b is the Goldreich- Levin hardcore predicate. Then 
G is a pseudorandom generator. 

Theorem 4. Let f be a -weak one-way function for some polynomial p(-)Q 
Let k = 4 np(n) and m = [log k ] . For input x G {0, 1}", h = [hi, . . . , h m ] G W 1 , 
define g(x, h) = ( f k (x , h), h) where f k is the Reusable Randomized Rerate of f. 
Then, g is a (strong) one-way function. 

The proofs of both these theorems appear in the full-version of the paper and 
on a high-level follow the proofs presented in 0 . 

6 Discussion and Conclusions 

This paper presented the Reusable Generalized Randomized Iterate, and its ap- 
plication to new efficient constructions of Universal One-Way Hash Functions 
based on regular one-way functions. These are the first such efficient construc- 
tions of UOWHF from regular one-way functions of unknown regularity. 

We also showed that the Reusable Generalized Randomized Iterate can be 
used to construct PRGs based on regular-one way functions, obtaining an alter- 
native proof of a result by pi . 

An interesting question raised by our work is the following: can we replace 
Shoup’s technique for TCR domain extension with any appropriate log-space 
derandomizer? This is not immediately clear, since the reconstruction algorithm 
of Lemma d plays a crucial role in our construction and such a property does not 
follow from the definition of derandomizers (although, the current derandomizers 
indeed have that property). 

A more conceptual contribution of this paper is to show that by combin- 
ing techniques from the collision-resistant hashing and PRG toolboxes we can 
improve efficiency in both areas. Following |DIHI| we believe that exploring the 
interplay between the two fields, and the possibility to apply techniques from 
one field to the other can lead to new and interesting discoveries. 

The works in jblbj highlight a inherent ” black-box duality” between PRGs and 
UOWHFs. Starting from the PRG constructions based on OWPs Q and OWFs 
jTTj, one can obtain the UOWHF constructions based on OWPs [T2| and OWFs 
jTSISj using the following ’’parallelism” . If there is ’’unpredictable” entropy in an 
input to an application of the one-way function in the PRG construction from 
which pseudo-entropy can be extracted, then there exists a symmetric TCR 
construction with the same structure where the output of the application of the 
one-way function has ’’inaccessible” entropy and can be compressed. 

Our Generalized Randomized Iterate justifies this observation for the case of 
regular one-way functions in a more direct way, by showing a single algorithm 
that yields either a PRG or a UOWHF depending on the parameters. 

7 A function / is an e-weak one-way function, if no adversary can succeed in inverting 

/ with probability better than 1 — e. 
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The approaches in and in this paper can hopefully help in addressing the 
following interesting open problem. Is there a transformation that takes any PRG 
construction from a primitive V with Gbit-expansion to a TCR construction from 
the same primitive V with Gbit-compression and vice-versa. For example, given 
a OWP with a large Gbit hard-core function, we know how to build a PRG that 
expands by i bits per invocation of the OWP: is it possible to obtain a TCR 
which compresses by t bits per invocation of the OWP? Conversely, an answer 
to the above general question would allow us to achieve more efficient PRG 
constructions from stronger primitives such as collision-resistant hash- functions. 
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Abstract. A perfect algebraic immune function is a Boolean function 
with perfect immunity against algebraic and fast algebraic attacks. The 
main results are that for a perfect algebraic immune balanced function 
the number of input variables is one more than a power of two; for 
a perfect algebraic immune unbalanced function the number of input 
variables is a power of two. Also, for n equal to a power of two, the Carlet- 
Feng functions onn+1 variables and the modified Carlet-Feng functions 
on n variables are shown to be perfect algebraic immune functions. 

Keywords: Boolean functions, Algebraic immunity, Fast algebraic 
attacks. 


1 Introduction 

The study of the cryptanalysis of the filter and combination generators of stream 
ciphers based on linear feedback shift registers (LFSRs) has resulted in a wealth 
of cryptographic criteria for Boolean functions, such as balancedness, high alge- 
braic degree, high nonlinearity, high correlation immunity and so on. An overview 
of cryptographic criteria for Boolean functions with extensive bibliography is 
given in 0 . 

In recent years, algebraic and fast algebraic attacks jl!5lfc>j have been regarded 
as the most successful attacks on LFSR-based stream ciphers. These attacks 
cleverly use overdefined systems of multivariable nonlinear equations to recover 
the secret key. Algebraic attacks make use of the equations by multiplying a 
nonzero function of low degree, while fast algebraic attacks make use of the 
equations by linear combination. 

Thus the algebraic immunity (AX), the minimum algebraic degree of nonzero 
annihilators of / or / + 1, was introduced by W. Meier et al. m to measure 
the ability of Boolean functions to resist algebraic attacks. It was shown by N. 
Courtois and W. Meier jjjj that maximum AX of n-variable Boolean functions is 
[" . The properties and constructions of Boolean functions with maximum AX 

were researched in a large number of papers, e.g., fbll 511 till 8141241251 • 
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The resistance against fast algebraic attacks is not covered by algebraic im- 
munity J7I21 1 7| . At Eurocrypt 2006, F. Armknecht et al. |2j introduced an ef- 
fective algorithm for determining the immunity against fast algebraic attacks, 
and showed that a class of symmetric Boolean functions (the majority functions) 
have poor resistance against fast algebraic attacks despite their resistance against 
algebraic attacks. Later M. Liu et al. mi stated that almost all the symmetric 
functions including these functions with good algebraic immunity behave badly 
against fast algebraic attacks. In m P. Rizomiliotis introduced a method to 
evaluate the behavior of Boolean functions against fast algebraic attacks using 
univariate polynomial representation. However, it is unclear what is maximum 
immunity to fast algebraic attacks. 

A preprocessing of fast algebraic attacks on LFSR-based stream ciphers, which 
use a Boolean function / : GF( 2) n — > GF( 2) as the filter or combination gen- 
erator, is to find a nonzero function g of small algebraic degree such that the 
multiple gf has algebraic degree not too large j0| ■ N. Courtois |£j proved that for 
any pair of positive integers (e, d) such that e + d > n, there is a nonzero function 
g of degree at most e such that gf has degree at most d. This result reveals an 
upper bound on maximum immunity to fast algebraic attacks. It implies that 
the function / has maximum possible resistance against fast algebraic attacks, 
if for any pair of positive integers (e, d ) such that e + d <n and e < n/ 2, there 
is no nonzero function g of degree at most e such that gf has degree at most d. 
Such functions are said to be perfect algebraic immune ( VAX ). Note that one 
can use the fast general attack 0 Theorem 7.1.1] by splitting the function into 
two f = h + l with l being the linear part of /. In this case, h = f + 1 rather 
than h = gf is used, then e equals 1, i.e., the degree of the linear function I. and 
d equals the degree of the function h , i.e., the degree of f. Thus VAL functions 
have algebraic degree at least n — 1. 

A VAL function also achieves maximum AL. As a consequence, a VAL func- 
tion has perfect immunity against classical and fast algebraic attacks. Although 
preventing classical and fast algebraic attacks is not sufficient for resisting alge- 
braic attacks on the augmented function m, the resistance against these attacks 
depends on the update function and tap positions used in a stream cipher and 
in actual fact it is not a property of the Boolean function. Thus the use of VAL 
functions does not guarantee that a stream cipher is not vulnerable to algebraic 
attacks since the attacker can also exploit suitable relations for the augmented 
functions as suggested in j6H2j . 

It is an open question whether there are VAL functions for arbitrary number 
of input variables. This problem was also noticed in j2j at Asiacrypt 2008. It 
seems that VAL functions are quite rare. In gj C. Carlet and K. Feng observed 
that the Carlet-Feng functions on 9 variables are VAL. One can check that the 
Carlet-Feng functions on 5 variables are also VAL (see also m3). However, no 
function is shown to be VAL for arbitrary number of variables. On the contrary, 
M. Liu et al. |T3 proved that no symmetric functions are VAL, and in [2Sj the 
authors proved that no rotation symmetric functions are VAL for even number 
(except a power of two) of variables. 


174 M. Liu, Y. Zhang, and D. Lin 


In this paper, we study the upper bounds on the immunity to fast algebraic at- 
tacks, and solve the above question. The immunity against fast algebraic attacks 
is related to a matrix thanks to Theorem 1 of 0 . By a simple transformation on 
this matrix we obtain a symmetric matrix whose elements are the coefficients of 
the algebraic normal form of a given Boolean function. We improve the upper 
bounds on the immunity to fast algebraic attacks by proving that the symmetric 
matrix is singular in some cases. The results are that for an n- variable function, 
we have: (1) if n is a power of 2 then a VAL function has algebraic degree n 
(showing that the function is unbalanced); (2) if n is one more than a power 
of 2 then a VAL function has algebraic degree n — 1 (which is also balanced); 
(3) otherwise, the function is not VAL. We then prove that the Carlet-Feng 
functions, which have algebraic degree n — 1, are VAL for n equal to one more 
than a power of 2, and are almost VAL for the other cases. Also we prove that 
the modified Carlet-Feng functions, which have algebraic degree n, are VAL for 
n equal to a power of 2, and are almost VAL for the other cases. The results 
show that our bounds on the immunity to fast algebraic attacks are tight, and 
that the Carlet-Feng functions are optimal against fast algebraic attacks as well 
as classical algebraic attacks. Our results explain the experimental observations 
of C. Carlet and K. Feng 0] and also prove their conjecture. 

The remainder of this paper is organized as follows. In Section |2| some basic 
concepts are provided. Section 0 presents the improved upper bounds on the im- 
munity of Boolean functions against fast algebraic attacks while Section 0 shows 
that the Carlet-Feng functions and their modifications achieve these bounds. 
Section 0 concludes the paper. 

2 Preliminary 

Let F2 denote the binary field GF( 2) and F£ the n-dimensional vector space over 
F2. An //.-variable Boolean function is a mapping from FJ into F2. Denote by 
B„ the set of all n-variable Boolean functions. An n-variable Boolean function 
/ can be uniquely represented as its truth table, i.e., a binary string of length 
2", 

f = [/(o, 0, • ■ ■ , 0), /(l, 0, • • ■ , 0), • • • , /(l, 1, • • • ,1)]. 

The support of / is given by supp(/) = [x £ ¥!j | f(x ) = 1}. The Hamming 
weight of /, denoted by wt(/), is the number of ones in the truth table of /. 
An n-variable function / is said to be balanced if its truth table contains equal 
number of zeros and ones, that is, wt (/) = 2 n_1 . 

An n-variable Boolean function / can also be uniquely represented as a mul- 
tivariate polynomial over F2, 

/ 0)= acX °' “ cSF 2, X c = x\ X xf C=(ci,c 2 ,--- ,C„), 

ceFj 

called the algebraic normal form (ANF). The algebraic degree of /, denoted by 
deg (/), is defined as max{wt(c) | a c 7^ 0}. 
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Let F 2 ** denote the finite field GF( 2”). The Boolean function / considered as 
a mapping from F 2 n into F 2 can be uniquely represented as 
2 n — 1 

/(*) = E a ^. OiGFa-, (1) 

i=0 

where f 2 (x ) = /( a;) (mod a; 2 " — x). Expression © is called the univariate 
polynomial representation of the function /. It is well known that f 2 {x) = 
f(x)( mods: 2 " — x) if and only if do, a 2 «-i G F 2 and for 1 < i < 2” — 
2, a 2 imod( 2 ”-i) = a 'j ■ The algebraic degree of the function / equals max wt(i), 

where i = Efc = 1 *fc2 fc_1 is considered as (*i, * 2 , - - ■ , i n ) G Fg. 

Let a be a primitive element of F 2 **. The afs of Expression P) are given by 
ao = /( 0), &2 n — 1 = /(0) + Ej=o 2 /(“*) and 
2 n — 2 

a* = for 1 < * < 2" - 2. (2) 

j=o 

For more details with regard to the representation of Boolean functions, we refer 
to 0. 

The algebraic immunity of Boolean functions is defined as follows. Maximum 
algebraic immunity of n- variable Boolean functions is \^~\ pj . 

Definition 1. w The algebraic immunity of a function f G B n , denoted by 
AZ(f), is defined as 

AL(f) = min{deg(ff) | gf = 0 or g(f + 1) = 0, 0 ^ g G B„}. 

The immunity of f against fast algebraic attacks is related to the algebraic degree 
e of a function g and the algebraic degree d of gf with e < d. For an n-variable 
function / and any positive integer e with e < n/2, there is a nonzero function g 
of degree at most e such that gf has degree at most n-e 1 1 . There are several 
notions about the immunity of Boolean functions against fast algebraic attacks 
in previous literatures, such as |LH21j . The perfect algebraic immune function 
we define below is actually a Boolean function which is algebraic attack resistant 
(see EH) and has degree at least n— 1. The latter is necessary for perfect algebraic 
immune function since a function of degree less than n — 1 admits e = 1 and 
d = deg (/) <n — 1 = n — e (taking g being a nonzero constant). 

Definition 2. Let f be an n-variable Boolean function. The function f is said 
to be perfect algebraic immune if for any positive integers e < n/2, the product 
gf has degree at least n — e for any nonzero function g of degree at most e. 

A perfect algebraic immune (' PAX ) function achieves maximum AL and is there- 
fore a Boolean function perfectly resistant to classical and fast algebraic attacks. 
As a matter of fact, if a function does not achieve maximum AL, then it admits 
a nonzero function g of degree less than n/2 such that gf = 0 or gf = g, which 
means that it is not VAX. 
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3 The Immunity of Boolean Functions against Fast 
Algebraic Attacks 

In this section, we present the upper bounds on the immunity of Boolean func- 
tions against fast algebraic attacks. We first recall the previous results for deter- 
mining the immunity against fast algebraic attacks, then state our bounds. 

Denote by W t the ordered set [x G Fg" | wt(a^) < i} in lexicographic order and 
by Wi the ordered set {x G ¥'% | wt(x) > i + 1} in the reverse of lexicographic 
order. According to the definitions of Wi and Wi , it follows that if x is the 
j - th element in W e , then x is the j-th element in W n _ e -i, where x = (aq + 
1, . . . , x n + 1). Here are some additional notational conventions: for y,z £ Fg , let 
z C y be an abbreviation for supp(^) C supp(y), where supp(®} = {i\xi = 1}, 
and let y fl z = (yi A zi , . . . , y„ A z n ), yUz = (yi V z\, . . . , y n V z n ), where A and 
V are the AND and OR operations respectively. We can see that z C y if and 
only if y z = y^y? ■■■y^ 1 =1. 

Let g be a function of algebraic degree at most e (e < n/2) such that h = gf 
has algebraic degree at most d (e < d). Let 

f{x) = £ f c x c , f c G F 2 , 

c£F£ 

9( x ) = YI 9zX z , g z G F 2 , 

ZEWe 

and 

h(x)= Y h y x\ h y G F 2 
yew d 

be the ANFs of /, g and h respectively. For y G Wa, we have h 
therefore 

0 = h v = Y Y f°9* = Y 9* Y f c - 

c£F« cU z=y zeWe cUz=y 

zew e ceF” 

The above equations on g z ’s are homogeneous linear. Denote by V(f;e,d ) the 
coefficient matrix of the equations, which is a jCJsAa (") X Y^l-o (") matrix 
with the (ijj^-th element equal to 

Vyz= Y fc= Y fc = y Z Y fc > ( 4 ) 

c\Jz=y yHzGcCy ynzCcCy 

c£F” zCy 

where y is the i-th element in W,i and 2 : is the j-th element in W e . Then / 
admits no nonzero function g of algebraic degree at most e such that h = gf has 
algebraic degree at most d if and only if the rank of the matrix V (/; e, d) equals 
the number of g z ’s which is J2i=o (i)> F(/;e,d) has full column rank (see 

also |21H)| L 


= 0 and 

(3) 
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Theorem 1. IH 1 (J\/ Let f € B n and J^ceF n fc x ° & e the ANF of f. Let V(f ; e, d) 
be the matrix whose ( i,j)-th element equals Y^ c uz=yfc> where V ts the i-th ele- 
ment in Wd and z is the j -th element in W e ■ 

Then there exists no nonzero function g of degree at most e such that the 
product gf has degree at most d if and only if the matrix V(f',e,d) has full 
column rank. 

Remark 1. The theorem shows that AL(f) > e if and only if the matrix V (/; e, e) 
has full column rank (since AI(f) > e if and only if there exists no nonzero 
function g of degree at most e such that h = gf has degree at most e). Then 
AZ(f) = [§] if and only if the matrix V{f\ [f] — 1, [§] — 1) has full column 
rank. 

Now we show that performing some column operations on the matrix V (/; e, d) 
creates a matrix with / c ’s as its elements. 

Lemma 2. = fynz- 

Proof. Note that eU % = y if and only if c C y, z C y and y C cUz, that is, 
y c = 1, y z = 1 and (c U z) y = 1. By 0) we have 

£ V* = £ £ ? c 

Z*C2 z*CzcUz*=» 

= £ £ y c y z *( cU z *) y f c 

z-CzcGFJ 

= £ y c f c £ y z \ c ^ z *) v 

c£F£ z*Cz 

=£/ c £ i 

cC2/ z*<Zyl~\z 
ydcUz* 

-£/c £ i 

cCl/ 3 /ncCz*C 2 /nz 

= £ /» 

cC!/,!/nc= 3 /nz 

= fynz- 


□ 

Lemma |2I shows that the matrix V (/; e, d) can be transformed into a matrix, 
denoted by W(f; e, d). with the (i, j)-th element equal to 

Wyz = fynz, ( 5 ) 

where y is the i-th element in Wd and z is the j-th element in W e . 

The (j, i)-th element of W(f; e, d) is equal to 

■Wzy = fzny = fynz = Wyz, 
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since z is the j-th element in W,; and y is the i-th element in W e by the definitions 
of Wd and W e . Recall that V (/; e, d) and W(/; e, d) are Yi=d+ 1 (?) x Yi= o (?) 
matrices. Therefore the matrix W(f;e,n — e — 1) is a symmetric Yi=o (?) x 
2?=o (?) matrix, denoted by W(f-,e). 

Theorem 3. Let f G B n and Ycev" fc x ° be the ANF of f. Let W(/;e, d) be 
the matrix whose ( i,j)-th element equals f y nz, where y is the i-th element in Wd 
and z is the j-th element in W e - 

Then there exists no nonzero function g of degree at most e such that gf has 
degree at most d if and only ifW(f; e, d) has full column rank. 

Proof. Lemma [21 shows that V (/; e, d) and W(f\ e, d) have the same rank. Then 
the theorem follows from Theorem Q] □ 

Remark 2. The theorem shows that AL(f) > e if and only if the matrix 
W(f;e,e ) has full column rank. Then AI(f) = [§] if and only if the matrix 
W(f; ffl — 1, ffl — 1) has full column rank. 

Next we concentrate on the upper bounds on the immunity of Boolean functions 
against fast algebraic attacks. As mentioned in Section |2l for an n- variable func- 
tion / and any positive integer e with e < n/ 2, there is a nonzero function g of 
degree at most e such that gf has degree at most n—e. This can also be explained 
by Theorem Q] or Theorem [3 the matrices V (/; e, n — e) and W (/; e, n — e) do 
not have full column rank since they are X)?=o (?) x Yi = o (?) matrices. From 
Theorem 0 the bounds on the immunity to fast algebraic attacks are related to 
the question whether the symmetric matrix W(f-, e) is invertible. 

Before stating our main results, we list a useful lemma about the determinant 
of a symmetric matrix over a field with characteristic 2. 


Lemma 4. Let A = be a symmetric m X m matrix over a field with 

characteristic 2, and an = a\i for 2 < i < m, that is, 


* an ai2 ai3 
ai2 a?2 023 
ai3 d23 ai 3 


aim \ 

a2m 

a-3m 


(6) 


y aim a 2m a 3m ' ' ' a lm / 

If ail = (to + 1) mod 2, then det(A) = 0. 

Proof. Let S m be the symmetric group of degree m. Then 


det(A) = Y, II 

<res m i=l 

= ft a ^W+ 5Z fk*> 

<reS m ,<7==li=l veS m ,cr 2 jtli=l 
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^since 


n ■>«..< 


- n “<’«>.* = 

i = 1 i = 1 



- £ ft «<.*<>■ 


If m is odd, then an = 0 and therefore 


det(A)=^ 11 'hXO 

j = 2 <7 2 =1 i= 2 

= E E °W II a h-w 

j=2 (T 2 = 1 2 <i<m 

(for odd to and a 2 = 1 , there is f such that / 7 ^ j and a(j') = /) 

= e e a ij a ij' n 

j = 2 ct 2 = i 2<f<m 

(there is unique a 1 such that crftl) = j', u'(j') = 1 , a (j ) = j, 
and </(*) = a(i) for i <£ (1 


If to is even, then an = 1 and therefore 


det(A) = J2 Il a h-W+E E a % II “MO 

<r 2 =l i=2 
a(i)=i 


-E E 


n °i,<r(i)+E E °W II 


□ 

Remark 3. For the matrix A of Lemma El it holds that det(A) = det(A (1>1 )) if 
a n = to mod 2, where is the (to — 1) x (to — 1) matrix that results from A 
by removing the i-th row and the j-th column. 

Theorem 5. Let f e B ra and fw-i be the coefficient of the monomial 
X 1 X 2 ■ ■ ■ x n in the ANF of f. Let e be a positive integer less than n/2. If 
f^ n — 1 = (" 7 1 ) + lmod2, then there exists a nonzero function g with degree 
at most e such that gf has degree at most n — e — 1 . 
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Proof. According to Theorem© we need to prove that the square matrix W(f;e) 
is singular when = (**~ ) + lmod2. Let Wij be the (i. j)-th element of 

W(f; e). Since 1 = (1, 1, • • • ,1) and 0 = (0, 0, • • • ,0) are the first elements in 
Wn-e-i and W e respectively, by © we have Wu = Wi.o = fu n -i- Because 

E'=o (i) = E£=i ("7 1 ) + Ei=i (El 1 ) + 1 « ("EK 1 * 10 * 12 )’ we know = 

Ei=o (") + lmod2 when = ("J 1 ) + l m od2. As mentioned previously, 

W(/;e) is a symmetric Ei=o (7) x Ei=o (7) rila 'ti'i x over F 2 . We wish to show 
that W(f-, e) has the form of ©. By © we have Wf t = Wu = w lz = /mg = 
fz = fznz = w zz = Wu where z is the i-th element in W ra _ e -i and z is the i-th 
element in W e . It follows from Lemma 0] that the matrix W(f; e) is singular. □ 

Corollary 6. Let n be an even number and f £ B ra . If f is balanced, then there 
exists a nonzero function g with degree at most 1 such that the product gf has 
degree at most n — 2. 

Proof. If / is balanced, then fi™-\ = 0. For even n. it holds that ( n ^ 1 ) + 1 = 
0(mod2). Therefore the result follows from Theorem 0 □ 

From CorollaryElit seems that for the number n of input variables, odd numbers 
are better than even ones from a cryptographic point of view (since cryptographic 
functions must be balanced). 

Lucas’ theorem states that for positive integers m and i. the following con- 
gruence relation holds: 



where m = Efc=i TO fc2 fc_1 and i = Efc=i are the binary expansion of m 

and i respectively. It means that (™) mod 2 = 1 if and only if i C to. 

Note that f‘ 2 n - \ = 1 if and only if deg(/) = n. Theorem 0 shows that for an 
n- variable function / of degree n and e n — 1, there is a nonzero function g 
of degree at most e such that gf has degree at most n — e — 1, and that for an 
n- variable function / of degree less than n and e C n — 1, there is a nonzero 
function g of degree at most e such that gf has degree at most n — e — 1. 

For the case n — 1 ^ {2 s , 2 s — 1}, there are integers e, e* with 0 < e, e* < n/2 
such that eCn-1 and e* (jin — 1, and thus an n- variable function is not VAX. 
This shows that for a VAX function the number n of input variables is 2 s + 1 or 
2 s . For n = 2 s + 1 (resp. 2 s ), it holds that e (f n— 1 (resp. e Cn — 1) for positive 
integer e < n/2, and thus an n-variable function with degree equal to n (resp. 
less than n) is not VAX. Recall that a function on odd number of variables with 
maximum AX is always balanced 0. For n = 2 s + 1, a VAX function has degree 
n — 1 and is balanced since it has maximum AX. For n = 2 s , a VAX function has 
degree n and is then unbalanced, since a function has an odd Hamming weight 
if and only if it has degree n. Consequently the following theorem is obtained. 

Theorem 7. Let f £ B r) be a perfect algebraic immune function. Then n is one 
more than or equal to a power of 2. Further, if f is balanced, then n is one more 
than a power of 2; if f is unbalanced, then n is a power of 2. 
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4 The Immunity of Boolean Functions against Fast 
Algebraic Attacks Using Univariate Polynomial 
Representation 

In this section we focus on the immunity of Boolean functions against fast al- 
gebraic attacks using univariate polynomial representation and show that the 
bounds presented in Section El can be achieved. 

Recall that W e is the ordered set {x £ Fiji wt(a:) < e} in lexicographic order 
and Wd is the ordered set {x G F£| wt(a:) > d+ 1} in the reverse of lexicographic 
order. Hereinafter, an element x = {x%, X 2 , ■ ■ ■ , x n ) in W e or Wd is considered as 
an integer x\ + x 2 2 + • ■ • + x n 2 n ~ 1 from 0 to 2" — 1, and the operations “+” and 
” may be considered as addition and subtraction operations modulo 2” — 1 
respectively if there is no ambiguity. 

Let /, g and h be n - variable Boolean functions, and let g be a function of 
algebraic degree at most e (e < n/2) satisfying that h = gf has algebraic degree 
at most d (e < d). Let 


/(*) = E fi xi > /ieF 2 n, 


sO) = E 9z eF2n > 

ZEWe 

and 

h(x)= E fhl xV > hy€¥ 2 n, 

yew d 

be the univariate polynomial representations of /, g and h respectively. For 
y £ Wd, we have h y = 0 and thus 

o = hy= E fi9z= E /»-*»*• ( 7 ) 

i+z=y zeWe 

ZGWe 

The above equations on g z ’s are homogeneous linear. Denote by U(f;e,d) the 
coefficient matrix of the equations, which is a J2i=d+ 1 ( ") x Yll-o (") matrix 
with the (f,, 7 ')-th element equal to 


U VZ = fy—z , ( 8 ) 

where y is the i-th element in Wd and z is the y-th element in W e . More precisely, 
for (i, j) = (1, 1) we have ( y , z) = (2 n — 1, 0) and u yz = f 2 *- 1 ; for ( i,j ) ^ (1, 1) 
we have y — z <£ { 0, 2” - 1} and u yz = f( y - z ) mo d( 2 »*-i) when e < d. 

If the matrix U(f; e, d) has full column rank, i.e., the rank of U (/; e, d) equals 
the number of g z s, then / admits no nonzero function g of algebraic degree at 
most e such that h = gf has algebraic degree at most d. 
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If the matrix C/(/;e, d) does not have full column rank, then there always 
exists a nonzero Boolean function satisfying Equations 0. More precisely, if 
9(x) = 'EzeWe 9zX z (g z € F 2 n) satisfies ( 0 ), then 

0 = K = Y fy—z9z = Y hy-^9 2 z, V 6 W d , (9) 

ZEWe zeWe 


where /a( 2 «-i) = h n -\ and fu is considered as / 2 imod( 2 "-i) f° r * ^ 2 " — 1 , and 
thus g 2 (x ) = Ezew 9 z x2z m od(x 2 —a;) satisfies 0- Note that the system of (0) 
and the system of 0 are actually the same. Therefore, if g(x) satisfies Equations 

0 then Tr(<?(a;)) satisfies Equations 0, where Tr(:r) = x+x 2 -\ \-x 2 . Also 

it follows that if g(x) satisfies Equations 0 then /3g(x) and Tr(f3g(x)) satisfy 
Equations 0 for any /3 G F 2 ». If g(x) ^ 0, then there is c G F 2 ™ such that 
g(c) jz 0 , and there is /3 G F 2 » such that Tr(^(c)) ^ 0 and thus Tr(f}g{x)) 7 ^ 0. 
Now we can see that Tr(/3 g(x)) is a nonzero Boolean function and satisfies 0. 
Hence if there is a nonzero solution for 0, then there always exists a nonzero 
Boolean function g satisfying 0 . 

Thus the following theorem is obtained. 

Theorem 8 . Let f G B ra and 1 ^ le univariate polynomial repre- 

sentation of f. Let U{f\e,d) be the matrix whose ( i,j)-th element equals f y ~ z , 
where y is the i-th element in Wd and z is the j-th element in W e . 

Then there exists no nonzero function g of algebraic degree at most e such that 
the product gf has algebraic degree at most d if and only if the matrix U(f ; e, d) 
has full column rank. 

Remark 4- As described at the beginning of this section, the sets W e and Wd 
of Theorem 0 are subsets of {0, 1, • • • ,2" — 1}, while the sets W e and Wd of 
Theorem 0 and Theorem 0 are subsets of Fj . 

Remark 5. The theorem gives a method using one matrix to evaluate the im- 
munity of Boolean functions against fast algebraic attacks based on univariate 
polynomial representation while in | 22 | P. Rizomiliotis used three matrices. 

Remark 6. The theorem shows that AI(f) > e if and only if the matrix C7(/; e, e) 
has full column rank. Then AL{f) = if and only if the matrix {/(/; [f] — 
1 , — 1 ) has full column rank. 

Remark 7. The matrix f/(/; e,n — e — 1), denoted by U(f-,e), is symmetric since 

Uzy = fz-y = f (2 n —l — z)—(2 n — l—y) = fy-z = U yz . 

Further, we have 

u yy = fv-y = fy-(2 n -l-y) = hy = fy = u y,0: 

and therefore U(f;e) has the form of 0. Hence Theorem 0 can also be derived 
from Theorem 0 and Lemma 0 
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4.1 Carlet-Feng Functions 

The class of the Carlet-Feng functions were first presented in m and further 
studied by C. Carlet and K. Feng Such functions have maximum algebraic 
immunity and good nonlinearity. It was observed through computer experiments 
by Armknecht’s algorithm j2] that the functions also have good behavior against 
fast algebraic attacks. In E23. Rizomiliotis determined the immunity of the 
Carlet-Feng functions against fast algebraic attacks by computing the linear 
complexity of a sequence, which is more efficient than Armknecht’s algorithm 
but is not yet feasible for large n. In this section, we further discuss the immunity 
of the Carlet-Feng functions against fast algebraic attacks and prove that the 
functions achieve the bounds of Theorem 0 

Let n be an integer and a a primitive element of F2n. Let / e B„ and 

supp(/) = {a 1 , ol 1+1 , a l+2 , ■ ■ ■ , a l+2n l ~ 1 }, 0 < l < 2" - 2. (10) 

Then AL{f) = [f] according to jTTiBl . As a matter of fact, the support of 
the function f(a l+2n x) + 1 is {0, l,a, ••• , a 2 " ~ 2 }, which is a Carlet-Feng 
function. It means that these functions are affine equivalent. 

A similar proof of 0 Theorem 2] applies to the following result. Here we give 
a proof for self-completeness. 

Proposition 9. Let 1 /<** be the univariate polynomial representation of 
the function f of HU) . Then fo = 0, fan_ 1 = 0, and for 1 < i < 2” — 2, 


* 1 + «“*/» ' 

Hence the algebraic degree of f is equal to n— 1. 

Proof. We have /o = /(0) = 0 and fv-y =0 since / has even Hamming weight 
and thus algebraic degree less than n. For 1 < i < 2 n — 2, by Q we have 


u = £ /Mo-*’ = E = E 


= a — : — = a — ^ = 

1 + or 1 1 + or 1 


1 + Q! _i / 2 " 

We can see that f' 2 ” -2 7^ 0 and therefore / has algebraic degree n — 1. □ 

Remark 8. For the function / of (|1 ( )l) . the (*, j)-th element of the matrix 
U(f; e, d) with e < d is equal to 


u yz -f^ z - — 


f or (i,j) ¥= (1,1), 


where y is the i - th element in W,j and z is the j-th element in W e . 
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Lemma 10. Let K be afield of characteristic 2. Let A = (aij) mxm be anmxm 
matrix over K and a^- = (1 + falj) -1 , fa, 7j € K and falj 1, 1 < i,j < m. 
Then the determinant of A is equal to 

n (A+^-)(7i+7j) n °«- 

1 <i<j<m 1 <i,j<m 

Furthermore, the determinant of A is nonzero if and only if fa ^ fa and 7* 7^ 7 j 
fori^j. 

Proof. The second half part of this lemma is derived from the first half part. 
The proof of the first half part is given by induction on m. First we can check 
that the statement is certainly true for to = 1 . Now we verify the induction step. 
Suppose that it holds for to — 1. Thus we suppose that 

det(zt (1,1) ) = (fa + fa)(li+lj) <Hj , 

2 <i<j<m 2<i,j<m 

where Afa»® is the (to — 1) x (to — 1) matrix that results from A by removing 
the <-th row and the j-th column. 

We wish to show that it also holds for to. Let B = ( bij) mxm with bij = aij 
and for i > 1, 


= + ( W 1 

1 + falj V 1 + /3i7i 1+A7i 1 + falj 

^ (1 + A7i)(1 + falj) + (1 + /3i7i)(1 + falj) 

(1 + falj ) 0 - + fal\)(l + fal i ) 

_ fall + falj + fall + falj 
(1 + faij){l + fali){l + falj ) 
m (fa + A) (71 + 7 j) 

(1 + falj)(l + fali)(l + falj ) 

= ■ (fa + fa)a n • (71 + 7 j)a lj . 


Let 


and 


P = diag(l, (fa + fa)a 2 i, ■ ■ • , (fa + P m )a m i) 


Q = diag(l, (71 + 72)012, • • • , (71 + 7m)oim) 


where diag(xi, • • • , x rn ) denotes a diagonal matrix whose diagonal entries start- 
ing in the upper left corner are x\, ■ ■ ■ ,x m . Then 


Hence 


l 0 A^-i ) ) 


det(A) = det(H) 
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= det(P) • an det^ 1 ’ 1 )) • det(Q) 


+ Pi)a a ) ■ audet(A (1 ’ 1} ) • ( 71 . + 7i)°:W 


= n (A+A')(7i+%) n ai i- 

X <i<j<m 1 <i,j<m 


It has now been proved by mathematical induction that the first half part of this 
lemma holds for all positive integers to. □ 


Lemma 11. Let A = ( and B — (bij)mxm. be m x m matrices with 
a-ij = Piljbij and fa 7^ 0, 7, 7^ 0 for 1 < i,j < m. Then det(A) ^ 0 if and only 
if det(.B) 7^ 0. 

Proof. Let P = dxa,g(fa, fa, • • • ,/3 m ) and Q = diag(7i,72, • • • ,7 m )- Then A = 
PBQ and hence det(A) = det(B) n”ii which proves this lemma. □ 

Proposition 12. Let e be a positive integer less than n/2 and f be the function 
of iTTTll . Then U(f\e) is invertible if f"” 1 ) = 0(mod2), and U(f-,e,n— e — 2) 
has full column rank if ( n 7 1 ) = l(mod2). 

Proof. Let U = U (/; e) and be the (i.j)-th element of U. We have t/n = 
f-2 n -i = 0. By RemarkQwe know that U is a symmetric matrix of order J2i = 0 (") 
in the form of ©■ For the case ( n “ 1 ) mod 2 = 0, we have YTi = 0 (") mod 2 = 0 = 
f/n . By Remark 0 it holds that det(U) = det([/F> 1 )). Remark 0 shows that the 
th element of is 




a~ yl ct zl 
1 + a -y/2 a z/2 ’ 


where y is the i-th element in W n _ e _i \ {2" — 1} and z is the j-tli element in 
W e \{0}, since e < n— e— 1 for e < n/2. Let U* be a Q^=o (") — l) x QZi=o (") — 1) 
matrix with the (i,j)-th element equal to 

U ii = l + a~ y / 2 a z / 2 ' 

Since oT v / 2 7^ oT v ' I 2 for y ^ y' (y,y' £ W„_ e -i \ {2” — 1}) and cr*/ 2 7^ a z ' ! 2 
for z ^ z' (z, z' £ W e \ {0}), from Lemma E3 we have det(C/*) 7^ 0. Then by 
Lemma [H] it holds that det([/F’ 1 )) ^ 0. Hence, U is invertible. 

For the case ( n / : ) mod 2 = 1, we consider the Yli=o (") x J2i=o (") matrix 
U(f;e,n — e — 2). For even n, we always have e < n — e — 2 for e < n/2. For 
odd n, we always have e< n — e— 2 for e < (n — 3)/2 and ("/*) mod2 = 0 for 
e = Thus for ("g 1 ) mod2 = 1 and e < n/2, we always have e < n — e — 2. 
Let U** be the YT %= 0 (") x 2i=o (") matrix that results from [/(/; e, n — e — 2) 
by removing the first ( e "J rows. A similar proof of det(C/Td)^ ^ q also applies 
to det({7**) 7^ 0. Then [/(/; e, n — e — 2) has full column rank. □ 
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The proof of the proposition shows that for the function / of (TO the rank of 
the matrix [/(/; e) is at least J2i=o (1) ~ (since the matrix C/l 1,1 ) is invertible). 
Then, by Theorem 0 / admits a unique nonzero function g with algebraic degree 
e such that gf has algebraic degree at most n — e — 1 when ( n ~ 1 ) = l(mod2). 

Theorem 13. Let e be a positive integer less than n/2 and f be the function of 
EW- Then f admits no nonzero function g with algebraic degree at most e such 
that gf has algebraic degree at most n — e — 1 if ("g 1 ) = 0(mod2), and admits 
no nonzero function g with algebraic degree at most e such that gf has algebraic 
degree at most n — e — 2 if ( n ~ 1 ) = l(mod2). 

Proof. It is derived from Theorem 0 and Proposition El □ 

Corollary 14. Let n = 2 s + 1 and f £ B ra be the function of EW- Then f is 
VAI. 

Proof. It is obtained from Theorem El since ("g 1 ) = ( 2 e ) = 0(mod2) for 1 < 
e < n/2. □ 

Theorem El states that the Carlet-Feng functions achieve the bounds of Theorem 
El and thus the bounds of Theorem El are tight for the functions with algebraic 
degree less than n, while Corollary El states that the Carlet-Feng functions on 
2 s + 1 variables are VAI. The theorem explains the experimental results of J4IH)j 
on the immunity of the Carlet-Feng functions against fast algebraic attacks, and 
implies the conjecture of C. Car let and K. Feng 01 Section 5]. 

Next we consider the Boolean functions with algebraic degree equal to n. 

Let n be an integer and a a primitive element of Fg» . Let / G B n and 

supp(/) = {0,a*y +1 ,--- ,a i+2n “ 1 “ 1 },0< Z< 2 n -2. (11) 

The function of (11 IB is a function that results from the function of (II ( )l by 
flipping the output at x = 0. 

A similar proof of Proposition El applies to the following result. 

Proposition 15. Let Y/h = o 1 f iX ' ^ le univariate polynomial representation of 
the function f of 1771) . Then fo = 1, / 2 "-i = 1: and for 1 < i < 2 n — 2, 


Hence the algebraic degree of f is equal to n. 

A similar proof of Proposition El also applies to the following result. 

Proposition 16. Let e be a positive integer less than 2^- and f be the function 
of dZ2b Then U{f\e) is invertible if ("g 1 ) = l(mod2), and U(f-,e,n — e — 2) 
has full column rank if f"” 1 ) = 0(mod2). 
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Theorem 17. Let e be a positive integer less than rj -iy- and f he the function of 
m- Then f admits no nonzero function g with algebraic degree at most e such 
that gf has algebraic degree at most n — e — 1 if ("g 1 ) = l(mod2), and admits 
no nonzero function g with algebraic degree at most e such that gf has algebraic 
degree at most n — e — 2 if ("g 1 ) = 0(mod2). 

Proof. It is confirmed by Theorem Eland Proposition E3 □ 

Similarly to the function of (IT7Tft . the function of (ITTIi admits a unique nonzero 
function g with algebraic degree e such that gf has algebraic degree at most 
n — e — 1 when ( n ~ = 0(mod2). 

In Theorem E] we do not consider the case e = for odd n, since Theorem 
El shows that for odd n, an n- variable function / with algebraic degree n admits a 
nonzero function g with algebraic degree at most such that gf has algebraic 
degree at most (noting that (n-i) mod 2 = 0). 

Corollary 18. Let n = 2 s and f 6 B„ be the function of 177)). Then f is VAI. 

Proof. It is obtained from Theorem 1171 since ("g 1 ) = ( 2 f f 1 ) = l(mod2) for 
1 < e < n/2. £ e □ 

Theorem E] states that the modified Carlet-Feng functions achieve the bounds 
of Theorem El and thus the bounds of Theorem El are tight for the functions 
with algebraic degree equal to n, while Corollary El states that the modified 
Carlet-Feng functions on 2 s variables are VAI. 

Consequently, as mentioned above, the bounds of Theorem El are tight and 
there exist VAI functions on 2 s and 2 s + 1 variables. More precisely, there exist 
n-variable VAI functions with degree n — 1 (balanced functions) if and only 
if n = 2 s + 1; there exist n-variable VAI functions with degree n (unbalanced 
functions) if and only if n = 2 s . 

5 Conclusion 

In this paper, several open problems about the immunity of Boolean functions 
against fast algebraic attacks have been solved. We proved the maximum im- 
munity to fast algebraic attacks, and identified the immunity of the Carlet-Feng 
functions against fast algebraic attacks. It seems that for a balanced function, in 
terms of the immunity to fast algebraic attacks, the optimal value of the number 
n of input variables is one more than a power of two. The Carlet-Feng functions 
previously shown to have maximum algebraic immunity and good nonlinearity 
are proved to be optimal against fast algebraic attacks among the balanced func- 
tions. To the best of our knowledge this is the first time that a class of Boolean 
functions are shown to have such cryptographic property. 


188 M. Liu, Y. Zhang, and D. Lin 


Acknowledgement. The authors thank the anonymous referees for their valu- 
able comments on this paper. The authors are also grateful to Tianze Wang 
for his careful reading of the manuscript, and to Shaoyu Du, Lin Jiao, Yao Lu, 
Wenlun Pan, Tao Shi, and Wenhao Wang for their participation in FAA sem- 
inar at SKLOIS in December 2011. The first author would especially like to 
thank Dingyi Pei for his enlightening conversations on the resistance of Boolean 
functions against algebraic attacks. 

References 

1. Armknecht, F.: Improving Fast Algebraic Attacks. In: Roy, B., Meier, W. (eds.) 
FSE 2004. LNCS, vol. 3017, pp. 65-82. Springer, Heidelberg (2004) 

2. Armknecht, F., Carlet, C., Gaborit, P., Kiinzli, S., Meier, W., Ruatta, O.: Efficient 
Computation of Algebraic Immunity for Algebraic and Fast Algebraic Attacks. In: 
Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 147-164. Springer, 
Heidelberg (2006) 

3. Carlet, C.: Boolean functions for cryptography and error correcting codes. In: 
Crama, Y., Hammer, P. (eds.) Boolean Methods and Models in Mathematics, 
Computer Science, and Engineering, pp. 257-397. Cambridge University Press, 
Cambridge (2010) 

4. Carlet, C., Feng, K.: An Infinite Class of Balanced Functions with Optimal Alge- 
braic Immunity, Good Immunity to Fast Algebraic Attacks and Good Nonlinearity. 
In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 425-440. Springer, 
Heidelberg (2008) 

5. Courtois, N.T., Meier, W.: Algebraic Attacks on Stream Ciphers with Linear Feed- 
back. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 345-359. 
Springer, Heidelberg (2003) 

6. Courtois, N.T.: Fast Algebraic Attacks on Stream Ciphers with Linear Feedback. 
In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 176-194. Springer, Hei- 
delberg (2003) 

7. Courtois, N.T.: Cryptanalysis of Sfinks. In: Won, D.H., Kim, S. (eds.) ICISC 2005. 
LNCS, vol. 3935, pp. 261-269. Springer, Heidelberg (2006) 

8. Dalai, D.K., Maitra, S., Sarkar, S.: Basic theory in construction of Boolean func- 
tions with maximum possible annihilator immunity. Designs, Codes and Cryptog- 
raphy 40(1), 41-58 (2006) 

9. Dalai, D.K., Gupta, K.C., Maitra, S.: Results on Algebraic Immunity for Crypto- 
graphically Significant Boolean Functions. In: Canteaut, A., Viswanathan, K. (eds.) 
INDOCRYPT 2004. LNCS, vol. 3348, pp. 92-106. Springer, Heidelberg (2004) 

10. Du, Y., Zhang, F., Liu, M.: On the Resistance of Boolean Functions against Fast 
Algebraic Attacks. In: Kim, H. (ed.) ICISC 2011. LNCS, vol. 7259, pp. 261-274. 
Springer, Heidelberg (2012) 

11. Feng, K., Liao, Q., Yang, J.: Maximal values of generalized algebraic immunity. 
Designs, Codes and Cryptography 50(2), 243-252 (2009) 

12. Fischer, S., Meier, W.: Algebraic Immunity of S- Boxes and Augmented Functions. 
In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 366-381. Springer, Heidel- 
berg (2007) 

13. Gong, G.: Sequences, DFT and Resistance against Fast Algebraic Attacks. In: 
Golomb, S.W., Parker, M.G., Pott, A., Winterhof, A. (eds.) SETA 2008. LNCS, 
vol. 5203, pp. 197-218. Springer, Heidelberg (2008) 



Perfect Algebraic Immune Functions 189 


14. Hawkes, P., Rose, G.G.: Rewriting Variables: The Complexity of Fast Algebraic 
Attacks on Stream Ciphers. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, 
pp. 390-406. Springer, Heidelberg (2004) 

15. Li, N., Qu, L., Qi, W., Feng, G., Li, C., Xie, D.: On the construction of Boolean 
Functions with optimal algebraic immunity. IEEE Transactions on Information 
Theory 54(3), 1330-1334 (2008) 

16. Li, N., Qi, W.-F.: Construction and Analysis of Boolean Functions of 21+1 Vari- 
ables with Maximum Algebraic Immunity. In: Lai, X., Chen, K. (eds.) ASIACRYPT 
2006. LNCS, vol. 4284, pp. 84-98. Springer, Heidelberg (2006) 

17. Liu, M., Lin, D., Pei, D.: Fast algebraic attacks and decomposition of symmetric 
Boolean functions. IEEE Transactions on Information Theory 57(7), 4817-4821 
( 2011 ) 

18. Liu, M., Pei, D., Du, Y.: Identification and construction of Boolean functions with 
maximum algebraic immunity. Science China Information Sciences 53(7), 1379- 
1396 (2010) 

19. MacWilliams, F.J., Sloane, N.J.A.: The theory of error correcting codes. North- 
Holland, New York (1977) 

20. Meier, W., Pasalic, E., Carlet, C.: Algebraic Attacks and Decomposition of Boolean 
Functions. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, 
vol. 3027, pp. 474-491. Springer, Heidelberg (2004) 

21. Pasalic, E.: Almost Fully Optimized Infinite Classes of Boolean Functions Resistant 
to (Fast) Algebraic Cryptanalysis. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. 
LNCS, vol. 5461, pp. 399-414. Springer, Heidelberg (2009) 

22. Rizomiliotis, P.: On the resistance of Boolean functions against algebraic attacks us- 
ing univariate polynomial representation. IEEE Transactions on Information The- 
ory 56(8), 4014-4024 (2010) 

23. Rizomiliotis, P.: On the security of the Feng-Liao-Yang Boolean functions with 
optimal algebraic immunity against fast algebraic attacks. Designs, Codes and 
Cryptography 57(3), 283-292 (2010) 

24. Tu, Z., Deng, Y.: A conjecture about binary strings and its applications on con- 
structing Boolean functions with optimal algebraic immunity. Designs, Codes and 
Cryptography 60(1), 1-14 (2011) 

25. Zeng, X., Carlet, C., Shan, J., Hu, L.: More balanced Boolean functions with 
optimal algebraic immunity and good nonlinearity and resistance to fast algebraic 
attacks. IEEE Transactions on Information Theory 57(9), 6310-6320 (2011) 

26. Zhang, Y., Liu, M., Lin, D.: On the immunity of rotation symmetric Boolean func- 
tions against fast algebraic attacks. Cryptology ePrint Archive, Report 2012/111, 
http : / / eprint . iacr . org/ 


Differential Analysis of the LED Block Cipher * 1 


Florian Mendel, Vincent Rijmen, Deniz Toz, and Kerem Varici 

KU Leuven, ESAT/COSIC and IBBT, Belgium 
{florian . mendel , vincent . rijmen , deniz . toz , kerem . var ici}@esat . kuleuven . be 


Abstract. In this paper, we present a security analysis of the lightweight 
block cipher LED proposed by Guo et al. at CHES 2011. Since the design 
of LED is very similar to the Even-Mansour scheme, we first review exist- 
ing attacks on this scheme and extend them to related-key and related- 
key-cipher settings before we apply them to LED. We obtain results for 
12 and 16 rounds (out of 32) for LED-64 and 16 and 24 rounds (out of 48) 
for LED-128. Furthermore, we present an observation on full LED in the 
related-key-cipher setting]]. For all these attacks we need to find good dif- 
ferentials for one step (4 rounds) of LED. Therefore, we extend the study 
of plateau characteristics for AES-like structures from two rounds to four 
rounds when the key addition is replaced with a constant addition. We 
introduce an algorithm that can be used to find good differentials and 
right pairs for one step of LED. To be more precise, we can find more than 
2 10 right pairs for one step of LED with complexity of 2 16 and memory 
requirement of 5 X 2 17 . Moreover, a similar algorithm can also be used 
to find iterative characteristics for the LED. 


1 Introduction 

Security in embedded systems, such as RFID and sensor networks, where the area 
is restricted is getting more and more important since people started interacting 
with them in daily life more often. Improving the efficiency while preserving the 
security is one of the main challenges in this area and it has been an ongoing 
research problem. Recently, many algorithms have been developed to address 
this problem: hash functions like Quark [I], photon |13j . spongent pj as well 
as block ciphers like Piccolo (2B| , LED jEJ , TWINE |24j and Klein JE] ■ Each of 
them uses the advantage of the improved knowledge on the design and analysis 
of symmetric key components. 

LED P2Q is a lightweight block cipher proposed by Guo et al. at CHES 2011. 
While being dedicated to compact hardware implementation with one of the 
smallest area consumptions (among block ciphers with comparable parameters) , 

* This work was sponsored he Research Fund KU Leuven, OT/08/027, by the IAP 
Programme P6/26 BCRYPT of the Belgian State (Belgian Science Policy) and by 
the European Commission through the ICT Programme under Contract ICT-2007- 
216676 (ECRYPT II). 

1 Due to the page limitations, details of this part is given in the full version of the 
paper in fBi| . 
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LED also offers reasonable performance in software. The design bears some resem- 
blance with the (generalized) Even-Mansour construction jjj with the difference 
that the same key is used in each step for LED-64 or every second step in the 
case of the larger variant LED- 128. The step function is based on AES-like design 
principles that provide good bounds against large classes of attacks including 
differential and linear cryptanalysis. Additionally, LED offers strong security ar- 
guments against attacks even in the related-key model. 

To the best of our knowledge, no external analysis of LED with respect to 
differential cryptanalysis has been published so far. The best existing differential 
attacks are distinguishers for 15 (out of 32) rounds of LED-64 and 27 (out of 
48) rounds of LED-128 in a hash setting, where the key is known to (or even 
chosen by) the attacker, described by the designers. Moreover, the security of 
LED against meet-in-the-middle attack has been investigated recently by Isobe 
et al. P23 ■ They describe attacks for 8 (out of 32) and 16 (out of 48) rounds for 
LED-64 and LED-128, respectively. 

Our Contribution. In this paper, we present the first external cryptanaly- 
sis of LED with respect to differential cryptanalysis. First, we show attacks for 
LED-64 reduced to 12 and 16 rounds. Furthermore, we present an observation 
on full LED in the related-cipher setting E5. ■ All our attacks are based on the 
attack of Daemen [3 on Even-Mansour construction im that is extended in a 
straightforward way to the related-key setting. 

Secondly, we show how to improve the bound for the maximum expected 
differential probability (MEDP) for four rounds (one step) of LED from 2 -32 to 
2-41.75 us j n g mega-boxes and the result of Park et al. BO- . 

Furthermore, we present algorithms to find differential characteristics with 
high probability that can be used in our attacks. By using the ideas of plateau 
characteristics |H| and extending the work with mega boxes P, we are able to 
obtain characteristics for four rounds of LED. We find more than 2 10 right pairs 
for a differential with a complexity less than 2 16 time and 5 X 2 17 memory and 
an iterative characteristic with six right pairs with the same complexities. We 
emphasize that our method is not specific to the block cipher LED and it can 
be used in the analysis of any AES-like construction where the key addition is 
replaced with a constant addition. 

Outline. This paper is organized as follows. In Section El we give a brief descrip- 
tion of LED and introduce the required definitions for our analysis. In Section El 
we describe the attacks on Even-Mansour construction and show how they can 
be extended to attack LED. We continue with differential analysis and give an 
algorithm to find the number of right pairs in a plateau characteristic in Sec- 
tion El We generalize this algorithm find characteristics for four rounds of LED 
in Section |3 and we provide the results for characteristics with high probability 
and iterative characteristics that can be used in our attacks in Section El 
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2 Description of LED 

LED HU is a conservative lightweight block cipher whose design can be seen as a 
special case of the generalized Even and Mansour construction [TT] depicted in 
Figure Q 



(a) 


(b) 


Fig. 1 . Even-Mansour Construction with (a) t = 1 and (b) i = 2 


LED accepts a 64-bit plaintext P, represented by a 4 x 4 array, and a 64-bit 
(or 128-bit) user key as inputs, and is composed of 8 (or 12) STEP functions 
preceded by a key addition. The STEP function is an AES-like design com- 
posed of four rounds. Each round is combination of Constant addition, S-boxes, 
ShiftRows, and (a variant of) MixColumns. LED uses the PRESENT S-box. In 
MixColumnsSerial, each column vector is multiplied by a matrix and replaced 
with the resulting vector. Note that the round constants for the second col- 
umn are obtained from a linear shift register while the round constants for the 
remaining three columns do not change. 

Key Schedule: LED has a simple key schedule where the 64-bit user key K is 
used as it is in each round whereas the 128-bit key is divided into two parts, 
K = K 0 \\Ki, and used alternately. For the remainder of this paper, we refer 
to these two versions as LED-64 and LED-128. For more detail, please check the 
specification of LED m 

One observation is that the S-boxes and linear transformations in the round 
function of the cipher can be described by structure of a super box : 

Definition 1 (Super box |2J). A super box maps an array a of m elements 
ai to an array e of m elements e*. Each of the elements has size n. A super 
box takes a key k of size m X n = nb where nb is the block size. It consists of 
the sequence of four transformations (layers): Substitution, Mixing, Round Key 
Addition, Substitution. 

Similar to AES [Z|, two rounds of LED can also be described alternatively as four 
parallel instances of the LED super box where the key addition is replaced by the 
constant addition. So, instead of dealing with the classical 4-bit S-boxes, one can 
consider 16-bit super boxes each composed of two S-box layers surrounding one 
MixColumnsSerial (MC) and one AddConstants (AC) function. 

Four rounds of LED can be described as a mega-box, where the elements are 16- 
bit words and the LED super boxes defined above are seen as S-boxes. The linear 
transformation in the middle is a combination of ShiftRows, MixColumnsSerial 
and ShiftRows respectively. We will refer to this linear transformation as SMS. 
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3 Attacks on the Even-Mansour Construction and 
Application to LED 

The Even-Mansour construction is a simple and yet provably secure block ci- 
pher construction. The designers have shown that the number of queries needed 
to break the scheme is bounded by 2"/ 2 , where n is the blocklength (n = 64 
for LED). A generic key recovery attack with chosen plaintexts showing that 
this bound is tight was introduced by Daemen j^j. Twenty years later, the con- 
struction was revisited. It was shown that the same bound applies to the known 
plaintext setting by using the slidex attack, an extended version of the slide 
attack [TOj . 

Simultaneously, Bogdanov et al. generalized the construction in jjj to more 
steps and discussed its security. They even provided a security proof for the 
construction in the single-key setting. However, as pointed out by the authors, 
the scheme is insecure in the related-key setting. In this section, we focus on the 
attack of Daemen on the Even-Mansour construction, since it is the basis for all 
our attacks on LED. First we show how it can be extended to a related key attack 
on the generalized Even-Mansour construction. Then, we will use it to attack 
reduced versions of the LED block cipher. 

3.1 Daemen’s Attack 

At Asiacrypt 1991 Daemen presented a generic key-recovery attack with com- 
plexity of 2"/ 2 jjjj . It can be summarized as follows. 

1. Choose a difference A. 

2. For £ values of a compute AF 0 = F 0 (a) © F 0 (a © A) and save the pair 
(AFq, a) in a list L. 

3. Choose an arbitrary plaintext P with P' = P(B A and ask for the ciphertexts 
C and C' 

4. Compute AC = C ®C' and check if AC is in the list L to get a. 

- If AC is in the list L then a candidate for the key is found. Compute 
K 0 = a © P and Kx = F 0 (a ) © C. 

- Else go back to Step 3. 

After repeating steps 3 — 4 about 2"/£ times one expects to find the correct key 
with complexity of about 2 n /t + t. Obviously the attack has the best complexity 
by choosing £ = 2"/ 2 resulting in a final attack complexity of about 2”/ 2 and 
similar memory requirements. 

Note that, the attack can be applied in an iterative way to attack the Even- 
Mansour construction with t > 1 with complexity of 2 t ' n / 2 and similar memory 
requirements. For instance, if t — 2 then we get a complexity of 2". 


3.2 Using Daemen’s Attack in a Related-Key Setting 

In certain scenarios one considers also related-key attacks where the adversary is 
allowed to get encryptions under several related keys. In this setting Daemen’s 
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attack can be adapted to attack t steps of the Even-Mansour construction with 
complexity of t- 2”/ 2 and similar memory requirements. For the sake of simplicity 
we first describe the attack for t = 2. 

Related Key Attack with t = 2. Let K,K' be two related keys, where 
K = KollAiH^ and K' = K 0 © A 0 ||Ai ® A\^K 2 ® A 2 , with arbitrary (but 
known) A 0 . A \ , A 2 and A\ ^ 0. Then we can do a key recovery attack on 
the Even-Mansour construction with t = 2 with complexity of roughly 2 n / 2 
and similar memory requirements using the attack of Daemen JJ. It can be 
summarized as follows. 

1. For l values of a compute AFi = F 1 (a) ® Fi (a ffi A\) and save the pair 
(AFi , a) in a list L. 

2. Choose an arbitrary P and P' = P ® Aq and ask for the ciphertexts C and 
C' 

3. Compute AC = C ® (C" ® A 2 ) and check if AC is in the list L to get a. 

— If AC is in the list L then a candidate for K 2 is found, K 2 = Fi (a) ® C. 

— Else go back to Step 2. 

After repeating steps 2 — 3 about 2 n /i times, the expected number of matches in 
the list L (i.e., candidates for K 2 ) is at least one. Note that, if we have more than 
one candidate for K 2 then we have to repeat the attack to get new candidates 
for K 2 . The intersection of both sets of candidates gives us the correct key. Note 
that it is very unlikely that this intersection will have more than one solution. 

Once K 2 is known one can apply the attack of Daemen to find Kq and K\ . This 
results in a final attack complexity of about 2-2 n /i+2l and memory requirements 
of t. Again, the attack has the best complexity by choosing l m 2”/ 2 resulting 
in a final attack complexity of about 2 ■ 2"/ 2 and memory requirements of 2 n / 2 . 

Related Key Attack with t > 2. The related key attack can be extended 
to more steps by applying the attack for t = 2 iteratively using more related 
keys with certain properties. Assume t = 3 and there are two related keys K = 
A0II.A1II.K2IIA3 and K' = K 0 ® Ao|| K\\\K 2 ® A 2 \\K 3 ® A 3 , with arbitrary (but 
known) Ao, A 2 , A 3 and A 2 ^ 0. Then one can find K 3 similar as in the attack 
on the Even-Mansour construction with t = 2 with a complexity of roughly 2 n / 2 . 
Once K 3 is found one can apply the attack for t = 2 with another pair of related 
keys to recover K 0 , K\ and K 2 . In general, one can find the key for t = i using 
i related keys with certain properties. 


3.3 Attacks on Reduced LED 

In this section, we will discuss the application of the attacks described in the 
previous section to the LED block cipher. Due to the fact that in LED the same key 
is used more than once the number of steps that can be attacked is significantly 
reduced. However, the attack can still be used in a straightforward way to break 
one and two steps of LED-64 in a single-key and related-key setting, respectively. 


Differential Analysis of the LED Block Cipher 195 


Both attacks have a complexity of about 2"/ 2 and similar memory requirements. 
Note that a similar related-attack was described recently in 0 . 

However, both attacks can be extended to more steps in the case of LED-128. 
In more detail, we can attack four and six steps of LED- 128 in the single- key and 
related-key setting, respectively. First, we describe an attack on four steps of LED- 
128 based on Daemen’s attack. It is based on the following simple observation 
(cf. Figure El). 



Fig. 2. Structure of LED-128 with t = 4 

Assume Kq is known, then one can peel off the first and last key addition. 
Thus, the attacker can remove one iteration at each side of the cipher with a 
complexity of about 2 64 tries on Kq. Moreover, assuming that Kq is known two 
steps of LED-128 can be viewed as one big iteration using only K i. In other 
words, we get a ‘new’ Even-Mansour construction with t = 1 and one key K t 
where we can apply Daemen’s attack to recover the key. Using this, one can find 
Kq and K\ for four steps of LED-128 with a complexity of about 2 3 "/ 2 . It can be 
summarized as follows. 

1. Guess the key Kq. 

2. For 2”/ 2 values a and a fixed A compute AF* = F*(a) 8 F* (a ® A) with 
F*(a) = F 2 (Fi(a) ® K 0 ) and save the pair ( AF*,a ) in a list L. 

3. Choose an arbitrary P and compute P' = F^ 1 (x 8 A) ® Kq with x = 
F 0 (P ® Kq). Ask for the ciphertexts C and C . 

4. Compute Ay = y ® y' with y = F^ 1 (C®Kq) and y' = F.^ 1 (C (B K 0 ). Check 
if Ay is in the list L to get a. 

- If Ay is in the list L then a candidate for the key is found. Compute 
K 1 =a®x. 

- Else go back to Step 3. 

5. Once Ki is found check if the key K = Ko\\Ki is correct. 

Since the expected number of Kq guesses that we need to make to find the 
correct key is 2", we need to repeat the attack 2" times. Since for each guess of 
Kq we need about 2 n / 2 computations to find K\ , the complexity of the attack is 
roughly 2 3 ”/ 2 . Note that the above attack needs the whole codebook. However, 
at the cost of a higher attack complexity, the data complexity of the attack can 
be reduced. To be more precise, in step 3 of the attack we can always choose P 
from a predefined subset and when computing P' we check if it is also in this 
subset, if not then we repeat this step. Thus, the data complexity of the attack 
can be reduced by simultaneously increasing the time complexity. 
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The attack can be extended to six steps of LED-128 using related keys as in the 
attack on the Even-Mansour construction with t = 2. The attack is very similar 
as the attack on four steps. Basically only steps 2 — 4 (Daemen’s attack) are 
replaced by the related key attack described in the previous section. The result 
is a key-recovery attack on six steps (24 rounds) of LED- 128 with complexity of 
about 2 3 "/ 2 . Again, as in the attack on 4 steps the data complexity of the attack 
can be reduced on the cost of a higher attack complexity. 

3.4 Extending the Attack to More Steps 

In this section, we discuss how the attacks can be extended to more steps of LED. 
First, we show that by exploiting differential properties of the STEP-function F, it 
might be possible to extend the attacks on LED-64 by one or two steps. Moreover, 
the attack on 4 steps can also be used in related-cipher attack m with related 
key setting on full LED-128. We represent our observation in the full version of 
our paper 1X9 - 


A A A A 



A* -> A A->? 


Fig. 3. Attack on LED-64 with t = 3 

In the following, we show how the attack can be extended to t steps of LED- 
64. The attack is based on the assumption that one can find a good related-key 
differential for the first t — 2 steps such that one gets a zero difference after 
the key addition of step t — 2. Then one can use Daemen’s attack on the last 2 
steps to recover the key. For In the attack on 3 steps we a differential with good 
probability in F 0 is used, see Figure 0 The attack can be summarized as follows. 

1. Assume we have given two related keys Kq and K' 0 = KqCQA and furthermore 
the differential A* — >■ A for Fo holds with probability p 2 -64 . 

2. For 2 (n+ p^' 2 values a compute AF 2 = F 2 {a ) ® F 2 (a ® A) and save the pair 
(AF 2 , a) in a list L. 

3. Choose an arbitrary P and P' = P ® A* ® A and ask for the ciphertexts C 
and C' 

4. Compute AC = C ® ( C' ® A) and check if AC is in the list L to get a. 

- If AC is in the list L then a candidate for K 0 is found, K 0 = F 2 (a) ® C. 

— Else go back to step 3. 

After repeating steps 3—4 about 2^ n+ p^ 2 times, the expected number of matches 
in the list L (and hence candidates for the key Kq) is 1/p. Since the differential 
in Fq will hold with probability p, for one of these matches we will have AF\ = 0. 
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Table 1. Summary of the attacks on LED 


algorithm 

# STEP 
functions 

complexity 

memory 

complexity 

attack type 

reference 

LED-64 

3 

2 («+|)/2 

2 ("+|)/2 

related-key 

Section 13.41 


4 

2 («+ i )/2 

2 («+ i )/2 

related-key 

Section 13.41 

LED-128 

4 

2 3 n /2 

2 «/2 

single-key 

Section 13.31 


6 

2 3 n /2 

2 «/2 

related-key 

Section 13.31 


12 

2 («+£)/2 

2 ( n +j,)/ 2 related-key-cipher 

mu 


Hence, one will find the right key after testing all candidates for K$ resulting 
from the 1/p matches in the list L. The complexity and memory requirements 
of the attack depends on p, i.e. 2^ n+ p^ 2 . 

The attack on three steps can be extended to four steps of LED-64. Assume we 
can find a good iterative differential for F\ that holds with probability p. Then 
this differential can be easily extended to a differential for the first 2 steps with 
the same probability (see Figure resulting in an attack on 4 steps of LED-64 
with complexity of 2 < ' n+ p^ 2 and similar memory requirements. 


A A A A A 



A ->■ A A ->? 


Fig. 4. Attack on LED-64 with t = 4 

In the Tabled we summarize the attacks on LED that are given in Section0We 
will discuss in the following sections how to find good (iterative) differential 
characteristics for one step of LED that can be used in the attacks on three and 
four steps. 

4 Differential Analysis and Plateau Characteristics 

In this section, we start with some definitions that will be helpful to under- 
stand the rest of the paper. We then give an introduction of the previous work 
on AES Eg and describe how we can use this method to find two/four round 
characteristics efficiently (and the corresponding right pairs). 


4.1 Characteristics and Differentials 

Differential cryptanalysis [2| is one of the most powerful techniques used in 
analysis of block ciphers, hash functions, stream ciphers, etc. It investigates how 
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an input difference (generally XOR) propagates through the target function. 
The concept of differential cryptanalysis starts with analyzing the components 
of the function, mostly focusing on S-boxes since they are the smallest nonlinear 
building block. In the analysis, we call an S-box active if it has a non-zero input 
difference, otherwise we call it passive. 

A differential characteristic Q = (Ao, A\, ■ ■ ■ , A m ) is a sequence of differences 
through various stages of the encryption. The sequence consists of an input dif- 
ference Ao, followed by the output differences of all the steps (Ai, A 2 , ■ ■ ■ , A m ). 

A differential HZ! over a map is denoted by (Ao, A m ) where Ao is the input 
difference and A m is the output difference. The differential probability 
DP(Ao, A m ) of a differential over a map / is the fraction of pairs with input 
difference A 0 that have output difference A m . 

For a keyed map, we can define differential probabilities DP[fc](Ao, A m ) and 
DP[fc](Q) for each value k of the key. Then, the expected differential probability 
(EDP) is the average of the differential probability over all keys. The weight 
of a differential or a characteristic is minus the binary logarithm of their EDP. 
Moreover, we define the height of a possible differential or a characteristic as 
the binary logarithm of the number of their right pairs satisfying (Ao, A m ) for 
a fixed key. 

A differential characteristic through the AES-like (including LED) super boxes 
consist of a sequence of four differences: the input difference a, the difference 
after the first substitution b, the difference after the mixing step which is equal 
to the difference after the round (key) constant addition d, and the output dif- 
ference after the second substitution e. These characteristics are denoted by 
Q = (a, b, d, e). 

It can be shown that SMS is a map whose branch number is 5. Therefore, a 
characteristic over a mega-box consists of 5 to 8 sub-characteristics, each over 
an LED super box. We denote the characteristics over the first and the second 
layer of super boxes by ( a,b,d,e ) and ( f,g,i,j ), respectively. 

4.2 The Maximum Expected Differential Probability of LED 

Differential cryptanalysis plays a crucial role in the analysis of symmetric key 
components since most of the cryptanalysis techniques are based on it. Therefore, 
giving bounds for resistance against differential cryptanalysis is one of the first 
steps in the evaluation of a design. In LED, the AES-like structure in the STEP 
function makes it possible to apply the previous work of (201 to bound the MEDP. 
By a straightforward computation of the formula stated in (201 Theorem 4], 
the designers compute the bound for the MEDP as 2 -32 . This bound can be 
improved by considering the STEP function as a mega-box and then using (201 
Theorem 1] to bound the MEDP of LED as 
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Here DP s ^*(x.y) is the probability of the characteristic (x,y) for the i-th super 
box obtained from the Difference Distribution Table (DDT). This result improves 
the approximations used in [131 Table 1]. We provided the bound for the first 
STEP function; the results for the other super boxes are similar. 

4.3 Planar Differentials 

Let 7 be a map and let F( a ,b)i G(a,b) be the sets that contain the input values, 
respectively the output values, for the right pairs of the differential (a, b). i.e., 
F (a,b) = +7(x + a) = bj and G^ b ) = "){F( a ,b))- A differential (a, 6 ) is 

called a planar differential, if F( a ,b) and G( 0 ,&) form affine subspaces jOj- In that 
case, we can write: 


F (a,b) =P + U(a,b) 

G(a,b) =Q + V { a t 6), 

where U( a j) and V( a h ) are uniquely defined vector spaces, p any element in F( a .b) 
and q any element in G( 0 i f,). Note that, if a differential (a, b) has exactly two or 
four right pairs, then it is always planar j^j. 

Plateau characteristics |2| are a special type of characteristics whose proba- 
bility for each value k of the key, DP[k](Q), depends on the key and can have 
only two values. For a fraction 2 n b -(wei g ht(Q)+hei g ht(Q)) ofthe keys DP[k](Q ) = 
2 h ei<iht{Q)-n b an( j f or a q other keys the it is zero. Note that the height is inde- 
pendent of the key. 

Two-Round Plateau Characteristic Theorem states that a characteristic Q = 
(a, b, c ) over a map consisting of two steps with a key addition in between, in 
which the differentials (a, b) and ( 6 , c) are planar, is a plateau characteristic with 
height(Q) = dim(V^b) fl U(b,c))- 

4.4 Algorithm for Number of Right Pairs in a Plateau 
Characteristic 

Here, we describe the algorithm to find the number of right pairs of a given 
characteristic Q = (a, b, d, e) through a super box. If the sub-characteristics (a, d) 
and (d, e) are planar then we can use the Two-Round Plateau Characteristic 
Theorem to compute the right pairs. Our aim in the algorithm is to build the 
matrix B containing the basis vectors of (M(V( a fi))) and Efo«) where M is the 
mixing operation and M(V) = {M(v)\v G V}. We denote vectors by rows of rib 
bits. 

The first step of our algorithm is to determine V^) and I/(d,e)- Since, the 
super box is a set of m parallel maps, H(a,6) and f7(d, e ) can be written as: 

V( a ,b) = V(oi,bi) X H (a2i62) X • • • X V (ami6m) 
u (d,e) = U(d i, ei ) X f/(d a ,e 2 ) x • • • X U (dm ,e m ) 

by using the Lemma 4 in 0. Now, if | > 0, we are interested in the 
output values of the right pairs. 
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— If |G (aii6i) | = 2, then the right pairs have input values in the set {q+ {0, bi}} 
for some q in G( ai ,bi) , the basis vector for V(oi,6i) being bi. 

— If |G( 0i ,6*)| = 2 fc where 2 < k < n, then V, ( ai ,bi) =< > and 

hence Vu.^ is said to be spanned by bi and f3j’ s. 

— If ( a,i,bi ) = (0,0) then G( ai ,&i) covers the whole space and V( ai j H ) =< 
wo,wi,--- ,w n - 1 > where Wj is a coordinate vector (i.e. a vector with 1 
at position j and zero at all other positions) and V is the standard basis. 

Similarly, if | F^ due .)\ > 0, we are interested in the input values of the right pairs. 
When we find the right pairs for each parallel map we can compute the height by 
using Algorithm 01 The number of dependent rows in B gives dirn{M{V^ a b ' j ) fl 
U ( d .e ) ) which is equal to the height. 

Algorithm [I] calls the following subroutines. Add(u) adds the vector v as a 
new row to the matrix B. RowReduce is the Gaussian Elimination and RowCount 
gives the number of nonzero rows of a matrix. 


Algorithm 1 Algorithm to compute the height of a given plateau characteristic 

Input: Characteristic Q = ( a,b,d,e ) with EDP(Q) > 0 

Output: height (Q) 



1: 

: procedure PRECOMPUTE 



2: 

: for i = 1 4 ro do 



3: 

Compute Vfai.bj) =< bi, /3 }, ... , /?*' 

! > and Uui ei ) =< <k, 5j, . . . , 5/ > 

4: 

5: 

: end for 
: end procedure 



6 

procedure HEIGHT 



7 

//at the input of Mixing 

22 

//at the output of Mixing 

8 

for i=04mdo 

23 

for i = 0->mdo 

9 

if bi = 0 then 

24 

if di = 0 then 

10 

for j = 0 — > n do 

25 

for j = 0 — > n do 

11 

Add(M(M4i+^)) 

26 

Add(u)4t+j) 

12 

end for 

27 

end for 

13 

else if bi > 0 then 

28 

else if di > 0 then 

14 

Add(M(6i)) 

29 

Add(di) 

15 

if > 2 then 

30 

if |G(d iiei )| > 2 then 

16 

for j = 1 — > ki do 

31 

for j = 1 — > kf do 

17 

Add(M(^)) 

32 

Add(df) 

18 

end for 

33 

end for 

19 

end if 

34 

end if 

20 

end if 

35 

end if 

21 

end for 

36 

end for 

37: B’ = RowReduce (B). 



38 

: return height(Q) = RowCount(B) 

- RowCount(B’) 

39 

: end procedure 
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The algorithm also gives us an insight on how to find the right pairs which can 
be determined by intersecting the affine spaces ^ n (Gf6,c) © k). This can be 
efficiently done by preparing the set of linear equations to solve. We would like to 
emphasize that, for a fraction of the keys the right pairs exists and their values 
differ depending on the key. On the other hand, if the constant operation is used 
instead of the key addition operation in the cipher, then it is not guaranteed 
always to have a solution. 

5 Non-plateau Characteristics: LED Mega-Box 

As we mentioned in Section |21 two rounds of LED can be considered as a super 
box and four rounds is defined as a mega-box. Let (a, b, d, e ) and (/, g, i,j ) denote 
the characteristic through the super boxes at the input and the output of SMS 
respectively. Since the super boxes are key independent, we consider them as 
16-bit S-boxes. This allows us to omit the middle values (6, d ) and (g, i) and use 
the differentials (a, e) and (f,j) in our analysis. 

In order to use the two-round plateau characteristic theorem, it is required 
that the set of output values G( a . e ) and the set of input values F (fJ) for the right 
pairs must be affine spaces/planar. However, this is not always guaranteed when 
the number of right pairs is greater than 4. Although the difference between the 
values of each pair is known and constant, some extra conditions between the 
pairs are also required for a set to become affine/planar. Therefore, we have to 
work with a union of affine spaces in order to compute the number of right pairs 
of a given characteristic. In the following, we will denote by height* the binary 
logarithm of the maximum number of right pairs of a given characteristic, over 
all values of the key. For a plateau characteristic, height* equals the height. 

The details of our algorithm are given below. An algorithmic description can 
be found in Algorithmic 

Precomputation: The first step of our algorithm is finding G( a e j and F Ud ) 
for the given path, and the next step is obtaining the subspace decompositions 
of F( 0ie ) and If y (Qi , ei) is affine then =< e, ei, . . . , e n >, otherwise 

it is a union of smaller vector spaces, i.e. V( ai , ei ) = VL i>e A U U . . . 

where m > 2. Therefore, we have to find the corresponding basis vectors (e,;'s) 
for each subspace. The results are then stored in a list, Lj, for each active super 
box. 

Analysis: We then use the Two- Round Plateau Characteristic Theorem to com- 
pute the height using the basis vectors obtained in the precomputation phase. 
Since the solution exist only for a fraction of the constant values, we check 
whether the given round constant is in the solution set or not. This step can also 
be done by solving a system of linear equations as in two-rounds, but this time 
the equations are obtained from the SMS layer and the basis vectors of the super 
boxes. 

Here, we would like to emphasize that the solution does not always exist 
for the round constant of LED. Denote by K q , the set of values, k. such that 


202 F. Mendel et al. 


Algorithm 2 Algorithm to compute the height* of a four-round characteristic 
Input: Characteristic Q = ( a,e 7 f,j ) 

Output: Upper bound for height* (Q) 

1: procedure PRECOMPUTE 
2: Lq = L\ = ... = Ly = 0 
3: for * = 0 -> 3 do 
4: Compute G( aijei ) and 

) = Decompose and < e™ , . . . e™ m >= V(™., ei ), d m = iVp^eol 

U tfjtM = Decompose and < £1,62, ■ ■ ■ £d n >= d n = | V’(5 i , Ji )| 

5: Store(Li, {(a», ei), ■ ■ ■ £™ m }) and Store(L4+i, £?, £2 1 ■ ■ ■ e d n }) 

6 : end for 
7: end procedure 

8: procedure ANALYZE 
9: count = 0 

10: for all q £ Lg X Li X . . . X L 7 do 
11: h = HEIGHT(g); 

12: count = count + 2 h 

13: end for 

14: return log2 (count) 

15: end procedure 


DP[fc](g) > 0. Since constants are used in the round function of LED, it is not 
guaranteed that the round constant, c r £ K q for all q. Therefore, the algorithm 
gives an upper bound for height* (Q). If the key addition was used in the round 
function rather than constant addition, it could be possible to find a key value 
k € K q for all q satisfying the upper bound. 

On the other hand, if the key addition was used, the Algorithm |2| could not 
be applied immediately, since the fists Li would depend on the key values and 
would not be unique. This would require recomputation of the fists for each key 
value increasing the complexity of the algorithm. 

Note that, since height* for four rounds is the summation over all possible 
decompositions q £ Lq x Li x . . . x L-j of the characteristic Q, height* (Q) is not 
guaranteed to be an integer, although height (q) is integer for all q. 

In Algorithm El Store adds input/output differences and the basis vectors 
{ei,£ 2 , . . .} to the list L. height is given in Algorithm Q] used with parameters 
m = 4 and n = 16. 


6 Application of the Algorithms Q] and |2] 

In this section, we give two examples to demonstrate how Algorithm Q and 
Algorithm El work. These examples can directly be used with attacks described 
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Fig. 5. (a)Path for iterative characteristics of the LED cipher (b)Mega-box representa- 
tion of the same path 


in Section [Q| We do not claim that these are the best characteristics in terms of 
probability for the STEP function of LED that one can find. For both examples, 
we fix the number of active S-boxes to 25 for four rounds of LED. Since, we 
know from previous work jjjj that all the characteristics with high probability 
are expected to have a low weight and a low number of active S-boxes. This 
also allows us to reduce the time and memory complexities of our algorithm and 
make the computation feasible. 

6.1 Iterative Characteristics 

Our aim is to find iterative characteristics (i.e., characteristics that have the same 
input and output difference) for the STEP function of the LED block cipher. We 
show that it is possible to obtain multiple iterative characteristics by using the 
16-bit boxes and the two round plateau characteristic theorem in 2 16 time and 
around 5 x 2 17 memory. In terms of efficiency, this computation can be compared 
with the inbound technique of the rebound attack m The main advantage of 
our computation is that many characteristics can be found whereas with the 
rebound attack, the expected number of characteristics that we find, equals one, 
using the same time complexity and slightly less data complexity. 

In our analysis we used the differential path given in Figure 0 It is possible 
to adopt the algorithm for the other possible differential paths. The algorithm 
is summarized as follows: 


Precomputation: For each of the active super boxes, obtain the differentials 
(ai,ei) (or ( fi,ji )) for the given path and find the corresponding right pairs 
GW*) (° r %,i () )- Then compute their affine subspace decomposition and the 
corresponding basis vectors. Store the input /output differences together with 
the basis vectors in a list. We denote these lists as Lo, L\, L 2 , L 3 for the super 
boxes at the input and L 4 for the super box at the output of the SMS layer. Note 
that this calculation is done for all possible differentials. Each list has around 
2 17 elements, therefore the total memory requirement of this step is 5 X 2 17 . 
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Algorithm 3 Compute iterative characteristics 
Input: Precomputed tables Li where i € {1, • • • ,5} 
Output: All the iterative characteristics with their height* 

1 : for all (e, /) € S do 
2: if (/ 0 , jo) € Li then 

3: A = MC o SR o AC{j) 

4: a = SRo AC (A) 

5: if (oi, et) G Li for 1 < i < 4 then 

6: h =HEIGHT*(Q) 

7: Output Q = ( a,e,f,j ) and h 

8: end if 

9: end if 

10: end for 


Analysis: We start from the four MixColumnsSerial operations in the SMS 
layer. Each of them has only one 4-bit word active at the output, hence we have 
15 4 « 2 16 possibilities for the differences at / (call the set of possibilities S). 
For each of these differences, we obtain the possible differences at j by using 
the precomputed list L 4 . Then, we compute (MC o SR a AC)(j) = A which is 
the output difference after four rounds of the STEP function and is also equal 
to the input difference of the STEP function since we are interested in iterative 
characteristics. We make one more computation (SR o AC) (A) to obtain the 
difference at a. Note that by choosing a difference for /, we have already fixed 
the difference at e. We then check whether (ai , e*) is in the list Li for 0 < i < 3. 
If it does for all i, we use the Algorithm El to compute the height* and find the 
right pairs. 

Results: In our analysis we found 240 iterative characteristics for the pattern 
given in Figure 0 but not all of them have a solution for the round constants of 
the LED block cipher. One of these characteristics is given below. It has 6 right 
pairs and the corresponding right pairs are given in m- 


a 0x6000 0x0003 0x0070 0x0000 

e 0x6962 0x5848 0x46A3 0x5CBF 

/ 0x943C 0x0000 0x0000 0x0000 

j 0x8000 0x0000 0x0000 0x0000 


6.2 Characteristics with High Height* 

In this section, our aim is to find characteristics with high height* for the STEP 
function of the LED block cipher. We show that it is possible to obtain such char- 
acteristics by using a similar algorithm to Algorithm 0 with 2 16 time complexity 
and 5 x 2 17 memory complexity. In our analysis we focused on the differential 
path given in Figure El and searched for characteristics whose height* is greater 
than 5. 
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Fig. 6. (a) Characteristics of the LED cipher with high height* (b)Mega-box represen- 
tation of the same characteristics 


Precomputation: All possible differentials together with the basis vectors of 
their affine space decomposition are stored in the lists Li, L 2 . L :i for each of the 
the super boxes at the input and in the lists L 4 . L-j for the super boxes at the 
output of the SMS layer. Again, each list has around 2 17 elements, and the total 
the memory requirement of this step is 5 X 2 17 . 

Analysis: We start from the two active MixColumnsSerial operations in the 
SMS layer. Each of them has two 4-bit words active at the output, hence we have 
(15 2 ) 2 « 2 16 possibilities for the differences at /. For each of these possibilities, 
we obtain the possible differences at a by using the precomputed lists Li , L 2 and 
L 3 . Similarly, the possible differences at j are obtained by using the lists L 4 and 
L-j. We then use Algorithm 0 to compute the height and find the right pairs. 

Results: Assume that dim(E( ae )) > 0 and dim(17(/ J )) > 0, then we can write 
V M = U TO V£ e) and U (m = (J n UJj . We define a partition by Q rnn where 
Q mn = SMS{V^ e) )nU^ f j) . Then we know that height* (Q) < log 2 (^ 2 dim ( Qm ")) 

(see Algorithm 0) . In our analysis we observed that it is not easy to find a par- 
tition whose height is greater than six, but by combining all partitions, we were 
able to find characteristics which have height* greater than eleven or twelve. One 
example of such characteristics is provided below. 


a 0x0000 0x0F91 0x2F0B 0x2803 

e 0x0000 OxCOOD 0x8F00 0x0F50 

/ OxOCDO 0x0000 0x0000 0x00C8 

j 0x8C07 0x0000 0x0000 0x50BF 


The upper bound for height* is computed as 12.16 by using the formula. 
However, not all partitions have a solution for the given round constant, and 
we obtain only 1026 right pairs for the round constants used in LED. We also 
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computed the number of right pairs by changing the round constant used in 
round three of the STEP function. The number of right pairs is computed as 
1024 ± e where e < 116 for all constants. 

To sum up, we introduced not only a new method that can be useful in the 
security evaluation of AES-like structures but we also showed that by using this 
method it is possible to obtain characteristics that can be used to attack LED 
(see Section Id. 411 . 

7 Future Work and Open Problems 

The analysis of super boxes and mega-boxes play an important role in the crypt- 
analysis of AES-like ciphers. In this paper, we focused on characteristics for the 
block cipher LED with 25 active S-boxes. Since it is not feasible to compute the 
whole distribution of the characteristics for four rounds of LED, we focus only 
on characteristics that may have many right pairs. Therefore, our results cover 
characteristics with high height* and iterative characteristics with a fixed pat- 
tern. The examples given in this paper are the best ones that we computed. But 
still, it is possible to cover other patterns with 25 active S-boxes and they might 
give better results and at the same time result in improvements of our attacks. 

We want to note that the algorithms given in this paper can also be used to 
compute the differentials for constructions using four rounds of AES as internal 
building block such as Pelican jS| giving new insights on these designs. Moreover, 
these algorithms might also be used in the computation of the inbound phase of 
the rebound attack. 
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Abstract. This paper presents a block cipher that is optimized with 
respect to latency when implemented in hardware. Such ciphers are de- 
sirable for many future pervasive applications with real-time security 
needs. Our cipher, named PRINCE, allows encryption of data within 
one clock cycle with a very competitive chip area compared to known 
solutions. The fully unrolled fashion in which such algorithms need to be 
implemented calls for innovative design choices. The number of rounds 
must be moderate and rounds must have short delays in hardware. At 
the same time, the traditional need that a cipher has to be iterative with 
very similar round functions disappears, an observation that increases 
the design space for the algorithm. An important further requirement is 
that realizing decryption and encryption results in minimum additional 
costs. PRINCE is designed in such a way that the overhead for decryp- 
tion on top of encryption is negligible. More precisely for our cipher it 
holds that decryption for one key corresponds to encryption with a re- 
lated key. This property we refer to as a-reflection is of independent 
interest and we prove its soundness against generic attacks. 

1 Introduction 

The area of lightweight cryptography, i.e., ciphers with particularly low imple- 
mentation costs, has drawn considerable attention over the last years. Among the 
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best studied algorithms are the block ciphers CLEFIA, Hight, KATAN, KTAN- 
TAN, Klein, mCrypton, LED, Piccolo and PRESENT j33T2T 15 1:2 (112912 Hd2»9j . 
as well as the stream ciphers Grain, Mickey, and Trivium 1221211 fi| . Particular 
interest in lightweight symmetric ciphers is coming from industry, as becoming 
evident in the adoption of CLEFIA and PRESENT in the ISO/IEC Standard 
29192-2. The dominant metric according to which the majority of lightweight 
ciphers have been optimized is chip area, typically measured in gate equivalences 
(GE), i.e. , the cipher area normalized to the area of a 2-input NAND gate in 
a given standard cell library. This is certainly a valid optimization objective in 
cases where there are extremely tight power or cost constraints, in particular pas- 
sive RFID tags. However, depending on the application, there are several other 
implementation parameters according to which a cipher should have lightweight 
characteristics. There are several important applications for which a low-latency 
encryption and instant response time is highly desirable, such as instant au- 
thentication or block- wise read/write access to memory devices, e.g., solid-state 
hard disks. There are also embedded applications where current block ciphers in 
multiple-clock architectures could be sufficiently fast, but the needed high clock 
rates are not supported by the system. For instance, in many FPGA designs 
clock rates above 200 MHz are often difficult to realize. It can also be antici- 
pated that given the ongoing growth of pervasive computing, there will be many 
more future embedded systems that require low-latency encryption, especially 
applications with real-time requirements, e.g., in the automotive domain. More- 
over, m as well as m show that low-latency goes hand in hand with energy 
efficiency, another crucial criterion in many (other) applications. 

For all these cases, we like to have symmetric ciphers that can instantaneously 
encrypt a given plaintext, i.e., the entire encryption and decryption should take 
place within the shortest possible delay. This seemingly simple problem poses a 
considerable challenge with today’s cryptosystems — in particular if encryption 
and decryption should both be available on a given platform. Software implemen- 
tations of virtually all strong ciphers take hundreds or thousands of clock cycles, 
making them ill suited for a designer aiming for low-latency cryptography. In the 
case of stream ciphers implemented in hardware, the high number of clock cy- 
cles for the initialization phase makes them not suitable for this task, especially 
when secret keys need to be regularly changed. Moreover, if we want to encrypt 
small blocks selected at random (e.g., encryption of sectors on solid-state disks), 
stream ciphers are not suitecQ. This leaves block ciphers as the remaining viable 
solution. However, the round-based, i.e., iterative, nature of virtually all existing 
block ciphers, as shown for the case of AES, makes low-latency implementation 
a non-trivial task. A round-based hardware architecture of the AES-128 requires 
ten clock cycles to output a ciphertext which we do not consider instantaneous 
as it is still too long for some applications. As a remedy, the ten rounds can be 
loop-unrolled, i.e., the circuit that realizes the single round is repeated ten times. 
Now, the cipher returns a ciphertext within a single clock cycle — but at the 
cost of a very long critical path. This yields a very slow absolute response time 


1 A possible exception are random-access stream ciphers such as Salsa @. 
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and clock frequencies, e.g., in the range of a few MHz. Furthermore, the unrolled 
architecture has a high gate count in the range of several tens of thousand GE, 
implying a high power consumption and costs. Both features are undesirable, es- 
pecially if one considers that many of the applications for instantaneous ciphers 
are in the embedded domain. Following the same motivation and reasoning as 
above m compares several lightweight ciphers with respect to latency and as a 
conclusion calls for new designs that are optimized for low-latency. 


Our Contribution. Based on the above discussion our goal is to design a 
new block cipher which is optimized with respect to the following criteria if 
implemented in hardware: 

1. The cipher can perform instantaneous encryption, a ciphertext is computed 
within a single clock cycle. There is no warm-up phase. 

2. If implemented in modern chip technology, low delays resulting in moderately 
high clock rates can be achieved. 

3. The hardware costs are moderate (i.e., considerably lower than fully unrolled 
versions of AES or PRESENT). 

4. Encryption and decryption should both be possible with low costs and 
overhead. 

We would like to remark that existing lightweight ciphers such as PRESENT 
do not fulfill Criteria 2 and 3 (low delay, small area) due to their large number 
of rounds. In order to fulfill Criterion 4, one needs to design a cipher for which 
decryption and encryption use (almost) identical pieces of hardware. This is an 
important requirement since the unrolled nature of instantaneous ciphers leads 
to circuits which are large and it is thus clearly advantageous if large parts of 
the implementation can be used both for encryption and decryption. 

Besides designing a new lightweight cipher that is for the first time optimized 
with respect to the goals above, PRINCE has several innovative features that 
we like to highlight. 

First, a fully unrolled design increases the possible design choices enormously. 
With a fully unrolled cipher, the traditional need that a cipher has to be iterative 
with very similar round functions disappears. This in turn allows us to efficiently 
implement a cipher where decryption with one key corresponds to encryption 
with a related key. This property we refer to as a-reflection is of independent 
interest and we prove its soundness against generic attacks. As a consequence, 
the overhead of implementing decryption over encryption becomes negligible. 
Note that previous approaches to minimizing the overhead of decryption over 
encryption, for example in the ciphers NOEKEON and ICEBERG usually re- 
quire multiplexer in each round. While for a round-based implementation this 
does not make a difference, our approach is clearly preferable for a fully unrolled 
implementation, as we require multiplexer only once at the beginning of the 
circuit. 

Another difference to known lightweight ciphers like PRESENT is that we 
balance the cost of an Sbox-layer and the linear layer. As it turns out optimizing 
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the cost of the Sbox chosen has a major influence on the overall cost of the 
cipher. As an Sbox that performs well in one technology does not necessarily 
perform well in another technology, we propose the PRINCE- family of ciphers 
that allows to freely choose the Sbox within a (large) set of Sboxes fulfilling 
certain criteria. Our choice for the linear layer can be seen as being inbetween 
a bit-permutation layer PRESENT (implemented with wires only) and AES 
(implemented with considerable combinatorial logic). With the expense of only 
2 additional XOR-gates per bit over a simple bit-permutation layer, we achieve 
an almost-MDS property that helps to prove much better bounds against various 
classes of attacks and in turn allows to significantly reduce the number of rounds 
and hence latency. 

As a result, PRINCE compares very favorable to existing ciphers. For the 
same time constraints and technologies, PRINCE uses 6-7 times less area than 
PRESENT-80 and 14-15 times less area than AES-128. In addition to this, 
our design uses about 4-5 times less area than other ciphers in the literature 
(see Section 0 and in particular Tables |T| and 0 for a detailed comparison and 
technology details). To facilitate further study and fairer comparisons, we also 
report synthesis results using the open-source standard-cell library NANGATE 
BDJ- We also like to mention that, although this is not the main objective of 
the cipher, PRINCE compares reasonably well to other lightweight ciphers when 
implemented in a round-based fashion (see GDI). 

We believe that our consideration can be of major value for industry and 
can at the same time stimulate the scientific community to pursue research on 
lightweight ciphers with different optimization goals. 


Organization of the Paper. We introduce an instance of PRINCE- family of 
ciphers and state our security claims in Section |3 Design decisions are discussed 
in Section El where we also describe the entire PRINCE-family. We provide se- 
curity proofs and evaluations considering cryptanalytical attacks in Section 0 
In Section 0 we finally present implementation results and comparisons with 
other lightweight ciphers for a range of hardware technologies. For further de- 
tails, including a detailed security analysis against standard attacks as well as 
test vectors, we refer to GUI • 

2 Cipher Description 

PRINCE is a 64-bit block cipher with a 128-bit key. The key is split into two 
parts of 64 bits each, 

k = ko 1 1 k\ 

and extended to 192 bits by the mapping 


(fco||Ah) -4 {koWk'oWh) := (feolKfeo 1) © (k 0 > 63)||fci). 
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PRINCE is based on the so-called FX construction 1712 aj : the first two subkeys 
ko and k' 0 are used as whitening keys, while the key k\ is the 64-bit key for a 
12-round block cipher we refer to as PRINCE core . We provide test vectors in the 
full version of the paper di- 



specification of PRINCE COT . e . 

The whole encryption process of PRINCE cor . e is depicted below. 


rc, rc 2 rc 3 rc, rc s 


RC, RC, RCn 




4[7]m]-^04 


Each round of PRINCE core consist of a key addition, an Sbox-layer, a linear 
layer, and the addition of a round constant. 


fci-add. Here the 64-bit state is xored with the 64-bit subkey. 


S-Layer. The cipher uses one 4-bit Sbox. The action of the Sbox in hexadecimal 
notation is given by the following table. 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

A 

B 

C 

D 

E 

F 


B 

F 

3 

2 

A 

C 

9 

1 

6 

7 

8 

0 

E 

5 

D 

4 


The Matrices: M/M'- layer. In the M and M'-layer the 64-bit state is mul- 
tiplied with a 64 x 64 matrix M (resp. M') defined in Section 13.31 


RCi-add. In the fiCj-add step a 64-bit round constant is xored with the state. 
We define the constants used below (in hex notation) 
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RCq 

0000000000000000 

RC ! 

13198a2e03707344 

rc 2 

a4093822299f3 ldO 

rc 3 

082efa98ec4e6c89 

RC a 

452821e638d01377 

rc 5 

be5466cf34e90c6c 

RCq 

7ef84f78fd955cbl 

rc 7 

85840851flac43aa 

rc 8 

c882d32f25323c54 

rc 9 

64a51195e0e3610d 

RC 10 

d3b5a399ca0c2399 

RCn 

c0ac29b7c97c50dd 


Note that, for all 0< i< 11, RCi®RCn-i is the constant a = c0ac29b7c97c50dd, 
RCq = 0 and that RCi , . . . , RC5 and a are derived from the fraction part of 
7T = 3.141.... 

From the fact that the round constants satisfy R.Cj ® RCn-i = a and that 
M' is an involution, we deduce that the core cipher is such that the inverse 
of PRINCE core parametrized with k is equal to PRINCE COT . e parametrized with 
(k®a). We call this property of PRINCE core the a-reflection property. It follows 
that, for any expanded key (/co||fco||fci), 

£>(fc 0 ||fc;,iifci)(-) = -^(fciiifcoiifcieaoO 

where a is the 64-bit constant a = c0ac29b7c97c50dd. Thus, for decryption one 
only has to do a very cheap change to the master key and afterwards reuse the 
exact same circuit. 


Security Claims. For an adversary that is able to acquire 2 n plaintext/ 
ciphertext pairs in a model with a single fixed unknown key k, we claim that the 
effort to find the key is not significantly less expensive than 2 127- ” calls to the 
encryption or decryption function. In Section give a bound matching this 

claim in the ideal cipher model that does consider the special relation between 
the encryption and decryption operations. One way to interpret this is that any 
attack violating our security claim will have to use more properties of the cipher 
than the relation between the encryption and decryption operations. 

We explicitly state that we do not have claims in related-key or known- and 
chosen-key models as we do not consider them to be relevant for the intended 
use cases. In particular, as for any cipher based on the FX construction or on 
the Even-Mansour scheme jTRj . there exists a trivial distinguisher for PRINCE 
in the related-key model: for any difference A, the ciphertexts corresponding to 
m and (m ® A) encrypted under keys (fco||&i) and ((fco ® Z\)||fci) respectively, 
differ from ((A 1) ® (A » 63)) with probability 1. 

Reduced Versions. Many classes of cryptanalytic attacks become more dif- 
ficult with an increased number of rounds. In order to facilitate third-party 
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cryptanalysis and estimate the security margin, reduced-round variants need to 
be considered. We encourage to study round-reduced variants of PRINCE where 
the symmetry around the middle is kept, and rounds are added in an inside-out 
fashion, i.e. for every additional round Sftj its inverse is also added. Another nat- 
ural way to reduce PRINCE is to consider the cipher without the key whitening 
layer, PRINCE core . 

3 Design Decisions 

In this section we explain our design decisions. First note that an SP-network 
is preferable over a Feistel-cipher, since a Feistel-cipher operates only on half 
the state resulting often in a higher number of rounds. In order to minimize the 
number of rounds and still achieve security against linear and differential attacks, 
we adopted the wide-trail strategy m- As not all round functions have to be 
identical for a cipher aiming for a fully unrolled implementation as PRINCE, it 
is very tempting to directly use the concept of code-concatenation m to achieve 
a high number of active Sboxes over 4 rounds of the cipher. However, not only 
a serial implementation benefits from similar round functions. It is also very 
helpful for ensuring a minimum number of active Sboxes. Assume that, using 
the code-concatenation approach, one can ensure that rounds Ri to R.-i+'i have at 
least 16 active Sboxes. While this is nice, the problem is that it does not ensure 
that rounds Ri-i to Ri + 2 or R 4+1 to R 4+4 have 16 active Sboxes as well if the 
individual rounds are very different in nature. We therefore decided to follow 
a design that on one hand allows to use the freedom given by a fully enrolled 
design and on the other hand still keeps the round functions similar enough to 
prove some bounds on the resistance against linear and differential attacks. 

In this context, one of the main features of the design is that decryption can 
be implemented on top of encryption with a minimal overhead. This is achieved 
by designing a cipher which is symmetric around the middle round, a very simple 
key scheduling, and a special choice of round constants. 

3.1 Aligning Encryption with Decryption 

The use of a core cipher having the Q-reflection property, with two additional 
whitening keys, offers a nice alternative to the usual design strategy which con- 
sists in using involutional components — Noekeon D2J, Khazad 0, Anubis 0, 
Iceberg j33| or SEA [33] are some examples of such ciphers with involutional com- 
ponents. Actually, the general construction used in PRINCE has the following 
advantages: 


— It allows a much larger choice of Sboxes, which may lead to a lower imple- 
mentation cost, since the Sbox is not required to be an involution. It is worth 
noticing that the fact that both the Sbox and its inverse are involved in the 
encryption function does not affect the cost of the fully-unrolled implemen- 
tations we consider; 
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— In ciphers with involutional components, the overhead due to the implemen- 
tation of the inverse key scheduling can be reduced by adding some symmetry 
in the subkey sequence. But this may introduce weak keys or potential slide 
attacks. The fact that all components are involutions may also introduce 
some regularities in the cyclic structure of the cipher which can be exploited 
in some attacks j0|. The resistance of PRINCE to this type of attacks will 
be extensively discussed in Section 14.21 

— It is an open problem to prove the security of ciphers with ideal, involutional 
components against generic attacks. We show in Section PTTI tha.t ciphers with 
the a-reflection property (for a 0) has a proof of security similar to that 
of the FX construction. 

— Previous approaches to minimizing the overhead of decryption over encryp- 
tion usually require multiplexer in each round while our approach requires 
multiplexer only once at the beginning of the circuit. 

3.2 The PRINCE-Family: Choosing the Sbox 

As discussed in more detail in Section 0 the cost of the Sbox, i.e., its area 
and critical path, is a substantial part of the overall cost. Thus, choosing an 
Sbox which minimizes those costs is crucial for obtaining competitive results. 
As the cost of an Sbox depends on various parameters, such as the technology, 
the synthesis tool, and the library used, one cannot expect that there is one 
optimal Sbox for all environments. In fact, in order to achieve optimal results it 
is preferable to choose your favorite Sbox. In order to ensure the security of the 
resulting design, an Sbox S : F 2 — > F 2 for the PRINCE-Family has to fulfill the 
following criteria. 

1. The maximal probability of a differential is 1/4 

2. There are exactly 15 differentials with probability 1/4. 

3. The maximal absolute bias of a linear approximation is 1/4. 

4. There are exactly 30 linear approximations with absolute bias 1/4. 

5. Each of the 15 non-zero component functions has algebraic degree 3. 

As it can be deduced for example from |2Hj up to affine equivalence there are 
only 8 Sboxes fulfilling those criteria. Thus, another way of defining an Sbox for 
the PRINCE-Family is to say that it has to be affine equivalent to one of the 
eight Sboxes Si given in the full version of this paper [I0j . 

3.3 The Linear Layer 

In the M and M'- layer the 64-bit state is multiplied with a 64 x 64 matrix M 
(resp. M') defined below. We have different requirements for the two different 
linear layers. The M'- layer is only used in the middle round, thus M' has to 
be an involution to ensure the a-reflection property. This requirement does not 
apply for the M- layer used in the round functions. Here we want to ensure full 
diffusion after two rounds. To achieve this we combine the M'- mapping with an 
application of matrix SR which behaves like the AES shift rows and permutes 
the 16 nibbles in the following way. 
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|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15| — )• |0|5|10|15|4|9|14|3|8|13|2|7|12|1|6|11| 

Thus M = SR o M'. 

Additionally the implementation costs should be minimized, meaning that the 
number of ones in the matrices M' and M should be minimal, while at the same 
time it should be guaranteed that at least 16 Sboxes are active in 4 consecutive 
rounds (see full version m for details). Thus, trivially each output bit of an 
Sbox has to influence 3 Sboxes in the next round and therefore the minimum 
number of ones per row and column is 3. Thus we can use the following four 
4x4 matrices as building blocks for the M'-layer. 


( 0000 \ / 1000 \ / 1000 \ 
0100 0000 0100 
0010 ’ Ml _ 0010 ’ Ma “ 0000 
0001 / \0001 / \0001 / 


m 3 = 


iooo\ 
0100 
0010 
0000 / 


In the next step we generate a 4 x 4 block matrix M where each row and column 
is a permutation of the four 4x4 matrices Mo, . . . , M 3 . The row permutations 
are chosen such that we obtain a symmetric block matrix. The choice of the 
building blocks and the symmetric structure ensures that the resulting 16 x 16 
matrix is an involution. We define 


In order to obtain a permutation for the full 64-bit state we construct a 64 x 64 
block diagonal matrix M' with (M®, JtfW, JtfW , M(°)) as diagonal blocks. The 
matrix M' is an involution with 2 32 fixed points, which is average for a randomly 
chosen involution 01 Page 596] . The linear layer M is not an involution anymore 
due to the composition of M' and shift rows, which is not an involution. 


( Mi M 2 M 3 M 0 \ 
M 2 M 3 Mo Mi 
M 3 Mo Mi M 2 
M 0 Mi M 2 M 3 / 


M<°> = 


M 0 Mi M 2 M 3 
Mi M 2 M 3 Mo 
M 2 M 3 Mo Mi 
M s Mq Mi M 2 


M« = 


3.4 The Key Expansion 

The 128-bit key (/coll^i) is extended to a 192-bit key (fco||fco||A;i) a linear 
mapping of the form 

(ko\\h) (fcollPMfei) • 

This expansion should be such that it makes peeling of rounds (both at the 
beginning and at the end) by partial key guessing difficult for the attacker. In 
particular, we would like that each pair of subkeys among k\ and the quantities 
(jfeo©A:i) an< l (&o®ki) takes all the 2 128 possible values when (fe 0 | |fei) varies in the 
set of 128-bit words. In other words, the set of all triples (fco||-P(&o)||fci) should 
correspond to an MDS code of length 3 and size 2 128 over F| 4 . This equivalently 
means that both x 1 — )■ P(x) and x x © P(x) should be permutations of F| 4 . 
Note that no bit-permutation P satisfies this condition. Indeed, both the all-zero 
vector and the all-one vector satisfy P(x) © x = 0. 
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Thus, a hardware-optimal choice for P such that both P and P ® Id are 
permutations is 

P{x) = (x 1) © {x 63) , 

i.e., P(x 63 , . . . , xo) = (xQ.Xd 3 , . . . , X 2 , Xi © X(j 3 ) . Then, we can easily check that 
P(x) = 0 (resp. P(x) = x) has a unique solution. 

4 Security Analysis 

This section investigates the security of the general construction of PRINCE. 
In particular, we show that the a-refiection property of the core cipher does 
not introduce any generic attack with complexity significantly lower than the 
known generic attacks against the FX construction. However, in the particular 
case of PRINCE core , the a-reflection property comes from some symmetries in 
the construction, including the use of an involution as middle round. Thus, 
we investigate in Section 14.21 whether weaknesses similar to those identified for 
involutional ciphers could also appear in the case of PRINCE. An evaluation 
of the security of PRINCE regarding more classical attacks, including linear, 
differential and algebraic but also to the recently introduced biclique attacks is 
provided in the full version m- 

4.1 On Generic Attacks: Security Proof 

The FX construction, introduced by Rivest for increasing the resistance of DES 
to exhaustive key-search [Zj , consists in deriving a block cipher E with (2n + re)- 
bit key and n-bit block from a block cipher F with re-bit key and n-bit block by 
xoring the input and output of F with a pre- whitening key and a post- whitening 
key: 

E koMM (x) = F kl (x © ko) © k 2 • 

Kilian and Rogaway j2al2(ij proved that, if the core cipher F is ideal, then this 
construction achieves (re + n — 1 — log T)-bit security where T is the number of 
pairs of inputs and outputs for F known by the attacker. This result obviously 
does not apply in the case of PRINCE since the core cipher F in PRINCE can 
be easily distinguished from a family of random permutations due to the a- 
reflection property, i.e., F^ 1 = F k(Sa for any k. Here, we want to quantify the 
impact of this property on the generic attacks against the FX construction. 
For instance, it appears that a decryption oracle also gives a related-key oracle 
with the fixed- key relation ( ko , k 2 ,k\) (k 2 , ko,k\ © a) and it is important to 
determine whether an adversary can profit from this relation. 

A similar question was investigated by Kilian and Rogaway for showing that 
the complementation property of DES decreases the security level by a single 
bit |25l Section 4] . In the case of the cc-reflection property, we like to model the 
core cipher F as an ideal cipher, that is as a set of random permutations, with 
the (only!) additional relation that F k(Sa (x) = F^ l (x). Informally, this can be 
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seen as picking only half of the 2 K permutations independently at random, while 
the second half is defined by the encryption vs decryption relation above. 

More precisely, we consider for F a keyed permutation with a (« — l)-bit key, 
operating on n-bit blocks. Let a be a nonzero element in F£. We decompose the 
set of n-bit words into two subsets as F£ = H U (a © H) where H is some linear 
subspace of dimension (« — 1) which does not contain a, e.g.. if lsb(a) = 1, H is 
the set of all n-bit words x with lsb(a:) = 0. In the following, H is identified with 
the set of ( k — l)-bit words. It is worth noticing that such a decomposition does 
not exist when a = 0, i.e., when F is an involution. Therefore, the following 
construction is defined for a ^ 0 only. Now, we derive from F a block cipher 
with (2 n + k) key bits and n-bit blocks: 


p ( \ _ f F ki ( m ® k o) © if hi e H 
UkoMMK™) ~ | F^ a (m © k 0 ) 0 k 2 if h G (a 0 H ) 


This construction, we refer to as FA-const ruction , corresponds to the FX con- 
struction applied to F where F is the family of 2 K permutations defined by 



The only difference with the construction considered in the case of the com- 
plementation property is that F is extended by using the inverse permutations 
Ffc, k £ H, instead of the permutations themselves. But, we can obtain a similar 
result. 

More precisely, when analyzing the original FX construction, Kilian and Ro- 
gaway |23 consider the following problem. Let A be an adversary with access to 
three oracles: E, F and F -1 . During the game, the adversary may make queries 
to F, to F and F -1 . Any query to the F/F -1 oracle consists of a pair (k, x) in 
F£ x F£ and the oracle returns an element in FJ. A query to the E oracle consists 
of an n-bit element, and an n-bit value is returned. The aim of this adversary is 
then to guess whether the E oracle computes FXk for some random key k. or if 
it computes 7 r for a random permutation of ¥%■ Then, a game- hoping argument 
leads to the following upper-bound on the advantage of any such adversary. 

Theorem i. Eg? The advantage of any adversary who makes D queries to the 
E oracle and T queries to the F/F -1 oracle satisfies 


Adv^(M) = |Pr[fc 4- Ff",FA (• P n f K : A FX «’ P ’ F 1 = 1] 

- Pr[7T f-P„,F 4 - (Fn) 2 * : A *’ F ' F ~ 1 - 1]| < DT 2 ~ { - n+K ~ 1) , 


where x <— S means that x is uniformly chosen at random from a set S, V n 
denotes the set of permutations of F£ and F <— (V n ) 2 means that F is a family 
of 2 K independently chosen random permutations. 

We deduce a similar result for the FX construction. 
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Corollary 1. The advantage of any adversary who makes D queries to the E or- 
acle and T queries to the F/F~ l oracle satisfies 

Adv| p *(.4) = |Pr[jfe A (' P n ) 2 " _1 : A FX *’ F ’ F ~ X =|| 

- Pr[7r Ap„,Ff- (Pn) 2 "” 1 : A W ’ F ’ F ~* = 1]| < DT2- ( - n+K -V 

Proof. We decompose 

P c = Pr[fc A F£ +2rt , F £ (Pn) 2 "”' : A FXh ’ F ’ F * = 1] 

= Pi[ko,k2< L F!f,k 1 +- H,F 4 - : A FXk o.^FF -1 = l] 


AFX kn , k ,^,F,F~' 


x Pr[jfei G H\ 

+ Pr[k 0 , k 2 £ FJ, fci A a © H, F £ (P„) 2 " 

X Pr[fci G a © H] 

= i Pr[fc 0 , k 2 G- Fj , fci A H, F A (Pn) 2 "' 1 : = 


FX koMM {x) - \ F -i XkoM(BaM ( X ) if k! G a®H . 


= 1 ] 


= 1 ], 


Pr[.A F 1 = 1] =Pr[A FX *o^PP 1 = -g 


leading to 

P c = Pr[fc 0 , *a A FJ, ki 4- H,F£- (P 2 "” 1 ) : ~|| 

It directly follows from Theorem Q] that 

Adv|™(.A) = Adv^(^) < jDr2 -("+ K - 2 ) . 


□ 

As noticed in 122 , this bound is still valid in a chosen-ciphertext scenario; it can 
also be extended to the case where the whitening keys are related, for instance 
if k 2 = ko or k 2 = P{k 0 ) as in PRINCE. Both generalizations apply to the FX 
construction as well. 

The bound obtained for the FX construction is achieved, for instance by the 
slide attack due to Biryukov and Wagner jSj and by its recent generalization 
named slidex Ed- A chosen-plaintext variant of this attack allows to exploit 
the a- reflection property for reducing the security level by one bit, compared to 
the original FX construction. This attack, detailed in the full version, has an 
average time complexity corresponding to 2 K+n ~ log2 D computations of the core 
cipher F for any number D of pairs of chosen plaintexts-ciphertexts. 
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4.2 Impact of the Construction Implementing the a-Reflection 

Property 

As mentioned earlier, one particular feature of PRINCE is the o-reflection prop- 
erty of PRINCE core . But, not surprisingly, the construction we used for obtain- 
ing this feature also has structural properties, including an involutional middle 
round, and care has to be taken when designing a cipher with such a structure. 
In this section we analyse the influence of this construction on the security of 
the cipher. In particular, we are interested in the so-called profile of the core 
cipher, i.e., in the sequence of the lengths of all cycles in the decomposition of 
PRINCE core . 

A first strategy for exploiting some information on the profile of the core 
cipher is the following. If the decomposition of the core cipher is independent 
from the key, then this decomposition can be used as a distinguishing property for 
recovering some information on the whitening keys. The simplest illustration of 
this type of attack is when the core cipher is an involution, i.e. when a = 0 which 
is the only case where CorollaryU]does not apply. Indeed, the attack presented by 
Dunkelman et al. m Section 5.2] allows to recover the sum of the two whitening 
keys (k 0 © k 2 ) in the FX construction when F is an involution. This attack uses 
the fact that for two plaintext-ciphertext pairs (m, c) and (m! ,d) related by 
m! = E^ kl k2 (m © ko © k%) it holds that m © c = m 1 © d. Indeed, 

m! © d = k 2 (m © ko © k 2 ) © Tn © ko © k 2 

= ko © F kl 1 (m © ko) © vn © ko © k 2 = F kl ( m © ko) © m © k 2 
= to© c 

where the last-but-one equality uses that F kl is an involution. Thus, plaintext- 
ciphertext pairs (m, c) and (m',d) such that d = m © ko © k 2 can be easily 
detected. Such a collision can be found if the attacker has access to 2^ known 
plaintext-ciphertext pairs, and it provides the value of (ko © k 2 ). Moreover, in 
the particular case of PRINCE, k 2 is related to ko by k 2 = P(ko) where x 
x © P(x) is a permutation (see Section 12.111 . Therefore, the whitening key ko 
can be deduced from (ko © k 2 ) in this case. It follows that, when the core cipher 
is an involution, the whole key can then be recovered with time complexity 2 K 
(corresponding to an exhaustive search for k\) and data complexity 2 This 
confirms that Corollary Q] does not hold for a = 0. 

This type of attack can be generalized to the case where the profile of the 
core cipher does not depend on k\: since PRINCE cor . e has a reasonable block 
size, its cycle structure could be precomputed and then used as a distinguishing 
property for (ko © k 2 ). Indeed, the profile of E^^m '■ m >->■ fa © F kl (m © ko) 
depends on (ko®k 2 ) only. It follows that, for each n-bit word 6, we could compute 
one or a few cycles of x ^ F kl (x © ko © k 2 © S) in a chosen-plaintext scenario 
where the attacker knows a sequence of plaintext-ciphertext pairs (mj,Cj) with 
TOj+i = c t © S. A valid candidate for (ko © k 2 ) is a value 5 which leads to a cycle 
having a length which appears in the precomputed profile of F kl . 
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We checked whether the cycle structure of PRINCE core has some peculiarities 
which do not depend on its key. Based on the technique used by Biryukov for 
analyzing involutional ciphers (H| ■ we can observe the profile of the reduced 
version of PRINCE core with 4 Sbox layers where we keep the symmetry around 
the middle does not depend on the key. Actually, this reduced version can be 
written as 


G = (iJg 1 o Add fcl0ct ) o (S' 1 oM'oS)o (Add*,, o R 5 ) 

where R$ corresponds to K5 without the key addition. Since S' -1 o M' o S is an 
involution, the cycle structure of Addfc 10a o (S -1 o M' o S ) o Addfe, depends on a 
only and not on ki . Its profile then remains unchanged after a right composition 
with R 5 and a left composition with its inverse. However, this property does not 
hold anymore when an additional round is included since the next key addition 
Addfc 10a 0G0 Addfc x modifies the cycle structure of G in a way which depends on 
the values G, and not only on its profile. Therefore, it appears that the previously 
mentioned attack strategy does not apply if PRINCE core contains more than 6 
Sbox layers. 

In the light of the previous analysis, a more relevant attack method consists 
in using the fact that the core cipher may have a peculiar cycle decomposition 
for some weak keys. For instance, if there exists some weak keys k\ for which 
PRINCE core is an involution, then this class of keys can be detected from the 
knowledge of 2 pairs of plaintext-ciphertext by counting the number of colli- 
sions for m ® c. And the technique from that we have previously described 
also recovers the whitening key. It is worth noticing that this attack applies to 
DESX and allows to detect the use of the four weak keys of DES m for which 
DES is an involution. A similar weakness would appear if, in PRINCE core , we 
have used two subkeys ki and k[ in turn as round keys. Keeping the remaining 
structure of PRINCE cor . e results in the following relation 

-^(feillfc') = -^(fcleallfciea) • 

However, this has serious - and interesting - consequences for the security of the 
resulting cipher. For the class of keys such that k[ = k\ ® a, it holds that 

F (fei||fc'i) = F (fci||*D’ 

that is, the core cipher is an involution. This class of weak keys can then be 
easily detected. It then appears that some particular related-key distinguishers 
for the core cipher may be exploited for detecting the corresponding class of 
keys. To be very clear, we do not consider related key-attacks here in the classical 
sense of enlarging the power of an adversary. But without a careful choice, the 
construction we used for implementing the a-reflection property might result in 
key-recovery attacks for certain weak-key classes, as soon as the core cipher is 
vulnerable to related key-attacks. 
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5 Implementation 

Besides the main target low-latency, low-cost hardware implementation is one of 
the design objectives of PRINCE. To achieve low-latency, a fully unrolled design 
should be considered for implementation. During the design process of PRINCE 
the cost of each function was investigated and each component was carefully 
designed in order to get the lowest possible gate count without compromising se- 
curity. One of the most critical and expensive operations of the cipher is the sub- 
stitution, where we use the same Sbox 16 times (rather than having 16 different 
Sboxes). Therefore, the implementation of PRINCE started with a search for the 
most suitable Sbox for the target design specifications. In order to achieve an im- 
plementation with low delay and gate count, we analyzed many Sbox instances to 
identify one with optimal combinational logic and propagation paths. Then, the 
targeted unrolled design was implemented with the resulting optimal Sbox. 

In the implementation process, Cadence NCVerilog 06.20-p001 is used for sim- 
ulation and Cadence Encounter RTL Compiler vlO.l for synthesis. Since gate 
count and delay parameters are heavily technology dependent, the implementa- 
tions have been synthesized for three different technology libraries: 130 nm and 
90 nm low-leakage Faraday libraries from UMC, and 45 nm generic NANGATE 
Open Cell Library. In all syntheses, typical operating conditions were assumed. 

The unrolled version of PRINCE is a direct mapping to hardware of the cipher 
defined in Section |2| Multiplexers select encryption and decryption keys accord- 
ingly. The only costs associated with the key whitening stages are XOR gates and 
multiplexers used for whitening key selection. However, in practice, due to the 
unrolled nature of the implementation, these additions reduce to XOR operations 
with constants, which in turn reduce to inverters or no additional gates at all. 
Furthermore, these inverters are combined with the preceding or following matrix 
multiplications, which are implemented with cascaded XOR gates. In cases where 
an XOR is sourced by the output from an inverter, or is sourcing input of an in- 
verter, it is simply replaced by an XNOR gate and the sourced/sourcing inverter 
is removed. Since both XOR and XNOR have the same gate count, the overall 
effect of the round constant addition on area reduces to zero. 

The unrolled implementation of PRINCE results are listed in Tabled for dif- 
ferent technologies with respect to different timing constraints. In this table, a 
unit delay (UD) parameter is used to enable a fair comparison between differ- 
ent technologies. It is the average delay of a single inverter gate (with lowest 
drive - XI) within a ring oscillator under zero wireload conditions in the target 
technology (6.7 ps, 31.9 ps, and 43.6 ps for 45 nm, 90 nm, and 130 nm, re- 
spectively). We also implemented PRESENT-80, PRESENT-128, LED-128 and 
AES-128 and applied the same metrics to adequately evaluate the achievements 
of our new cipher (note that in some cases the key size - and also our security 
claim - is different: PRINCE does not claim to offer 128-bit security and security 
against related key-attacks). In order to achieve both encryption and decryption 
capability in PRESENT and LED, we had to implement both true and inverse 
Sboxes and select their output by a multiplexer, which doubled the Sbox area 
with respect to an encryption-only implementation. For AES, we just had to 
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implement the inverse affine transform since the finite field inversion module 
could be shared between encryption and decryption. In addition to this compar- 
ison, Table El shows the extrapolated results (which are calculated by removing 
register and control logic area from the total gate count, and multiplying the 
rest by the number of rounds) for other unfolded cipher instances obtained from 
round-based cipher implementations provided by previous works. Note that all 
ciphers in the table include encryption and decryption functionality with 128-bit 
key size, however the comparison is difficult as the block size is different in some 
cases (also note that the ciphers having 128-bit block size are obviously much 
bigger and more power consuming than a 64-bit block cipher). 

We also measured maximum frequencies achievable by unrolled versions of 
PRINCE under two different conditions: The frequency where the area of syn- 
thesized design starts to deviate from the unconstrained area - 158.9, 38.4 and 
35.5 MHz, and the frequencey where the timing slack becomes zero - 212.8, 71.8 
and 54.3 MHz. Both figures are given for 45 nm, 90 nm, and 130 nm, respectively. 


Table 1. Area/power comparison of unrolled versions of PRINCE and other ciphers 



Tech. 

Nangate 45nm Generic 

UMC 90nm Faraday 

UMC 130nm Faraday 

Constr .(UD) 

1000 

3162 

10000 

1000 

3162 

10000 

1000 

3162 

10000 

PRINCE' 

Area (GE) 

8260 

8263 

8263 

7996 

7996 

7996 

8679 

8679 

8679 

Power(mtV) 

38.5 

17.9 

8.3 

26.3 

10.9 

3.9 

29.8 

11.8 

4.1 

PRESENT-80 

Area(GE) 

63942 

51631 

50429 

113062 

49723 

49698 

119196 

51790 

51790 

Power (m IV) 

1304.6 

320.9 


1436.9 

144.9 

45.5 

1578.4 

134.9 

42.7 

PRESENT-128 

Area (GE) 

68908 

56668 


120271 

54576 

54525 


56732 

56722 

Power(mtV) 

1327.1 

330.4 

99.1 

1491.1 

149.9 

47.8 

1638.7 

137.4 

43.6 

LED-128 

Area (GE) 

109811 

109958 

109697 

281240 

286779 

98100 

236770 

235106 

111496 

Power(mlV) 

2470.7 

835.7 

252.3 

5405.0 

1076.3 

133.7 

5274.8 

1133.9 

163.6 

AES-128 

Area ( GE ) 

135051 

135093 

118440 

421997 

130835 

118522 

347860 

141060 

130764 

Power (mW) 

3265.8 

1165.7 

301.6 

8903.2 

587.4 

l.Mi - 

8911.2 

876.8 

229.1 


Table 2. Extrapolated area of unrolled versions of other ciphers against PRINCE 


Technology 

Area* (GE) 

CLEFIA-128 d 

28035 (18 rounds unfolded, 130nm CMOS) 

H IOHT-1 28 P| 

42688 (32 rounds unfolded, 250nm CMOS) 

mCrypton-128 |2if| 

37635 (13 rounds unfolded, 130nm CMOS) 

Piccolo-128 |22| 

25668 (31 rounds unfolded, 130nm CMOS) 


* Area requirements extrapolated from round-based implementations. 
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Abstract. In this paper, we study differential attacks against ARX 
schemes. We build upon the generalized characteristics of de Canniere 
and Rechberger; we introduce new multi-bit constraints to describe dif- 
ferential characteristics in ARX designs more accurately, and quartet 
constraints to analyze boomerang attacks. We also describe how to prop- 
agate those constraints; this can be used either to assist manual con- 
struction of a differential characteristic, or to extract more information 
from an already built characteristic. We show that our new constraints 
are more precise than what was used in previous works, and can detect 
more cases of incompatibility. 

In particular, we show that several published attacks are in fact fact in- 
valid because the differential characteristics cannot be satisfied. This high- 
lights the importance of verifying differential attacks more thoroughly. 

Keywords: Symmetric ciphers, Hash functions, ARX, Generalized char- 
acteristics, Differential attacks, Boomerang attacks. 

1 Introduction 

A popular way to construct cryptographic primitives is the so-called ARX design, 
where the construction only uses Additions (a EH 6), Rotations (a i), and Xors 
(a © 6). These operations are very simple and can be implemented efficiently in 
software or in hardware, but when mixed together, they interact in complex and 
non-linear ways. In particular, two of the SHA-3 finalists, Blake and Skein, follow 
this design strategy. More generally, functions of the MD/SHA family are built 
using Additions, Rotations, Xors, but also bitwise Boolean functions, and logical 
shifts; they are sometimes also referred to as ARX. This stategy as also been 
used for stream ciphers such as Salsa20 and ChaCha, and block ciphers, such 
as TEA, XTEA, HIGHT, or SHACAL (RC5 uses additions and data-dependant 
rotations, but we only consider construction with fixed rotations). 

The ARX design philosophy is opposed to S-Box based designs such as the 
AES. Analysis of S-Box based designs usually happen at the word-level, and 
differential characteristic are relatively easy to build, but efficient attacks often 
need novel techniques, such as the rebound attack against hash functions m, 
For ARX designs, the analysis is done on a bit-level; finding good differential 
characteristics remains an important challenge. In particular, the seminal at- 
tacks on the MD/SHA- familiy by the team of X. Wang are based on differential 
characteristics built by hand |I2 812712!)! . and an important effort has been devoted 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 226- |243,| 2012. 

(c) International Association for Cryptologic Research 2012 
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to building tools to construct automatically such characteristics ftil2HI81l 5l24j . 
This effort has been quite successful for functions of the MD/SHA family, and 
it has allowed new attacks based on specially designed characteristics: attacks 
against HMAC 0, the construction of a rogue MD5 CA certificate E3, and 
attacks against combiners jTTTj . 

Another important problem is that the components of an ARX design can 
interact in complex and unexpected ways. Differential characteristics are usually 
built by looking at each operation individually, and multiplying the probabilities 
of each non-linear operation, but this approach can lead to very misleading re- 
sults. For SHA-0 and SHA-1 differential characteristics, it has been shown that 
the hypothesis of independence between the local collisions is flawed, and some 
patterns of local collisions lead to impossible characteristics |4I21I14| . Problems 
have also been identified for differential attacks on SHACAL m ■ More recently, 
Mendel, Nad, and Schlaffer have tackled the problem of building differential char- 
acteristics for SHA-2, and found that many of them are in fact incompatible m 

A similar problem has been discussed in the context of boomerang attacks by 
Murphy m- the assumption that the differential characteristics are independent 
does not necessarily hold. Several recent works have found characteristics that 
turned out to be incompatible when analyzing ARX hash functions such as 
HAVAL |23, SHA-256 0, or Skein [Dj. 


Our Results. In this paper, we try to provide a framework to study these 
problems for ARX designs. In pure ARX functions, the modular addition is the 
only source of non-linearity (with respect to the xor difference). Therefore it is 
important to capture its behaviour as accurately as possible. 

We extend the generalized characteristics of de Canniere and Rechberger 0 
by introducing constraints involving several consecutive bits of a variable (i.e. 

and xl 1-1 !), instead of considering bits one by one. We show that constraints 
on 2 consecutive bits can completely capture the modular difference, and we 
introduce reduced sets of constraints on 1.5 and 2.5 consecutive bits. This is 
motivated by the analysis of modular addition, but since these constraints are 
still local, they interact well with bitwise Boolean operations and rotations, and 
we can use them to study pure ARX as well as SHA-like constructions. We show 
that they capture more information than the single bit constraints of 0. In 
particular, we describe cases of incompatibility in ARX characteristics due to 
interactions between consecutive bits, and we show that a proposed path for 
Skein is invalid m ■ This is detected automatically by our new constraints. 

We also study boomerang attacks, and introduce constraints on quartets of 
variables, instead of considering each characteristic separately with constraints 
on pairs of variables. This allows to capture some extra information in the middle 
of the attack, when the top characteristic and the bottom characteristic meet. In 
particular, we can automatically detect incompatibilities in previously published 
attacks against Skein □SI and Blake US- 
As opposed to E5I, our work is focused on local conditions, and we try to 
extract as much information as possible from a single operation. If needed, it 
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can be combined with more computing intensive techniques considering several 
operations simultaneously. 

Additionally, we give a complete description of how to compute the probability 
of a characteristic using these constraints, and how to do constraints propagation. 
All our code will be available from our webpagcQ so that these tools can be used 
by the community to build or verify differential attacks. Our tools are quite 
generic and we hope that they can be used to study more primitives. We don’t 
provide a complete solution to automatically find differential characteristics in 
ARX schemes, but we believe our work is an important step in this direction. 

This paper is organized as follows: first, we explain the theory of S-systems 
and how to solve them efficiently in Section |2J and we show how to use S-systems 
to study differential attacks using the generalized characteristic of de Canniere 
and Rechberger in Sectional In Section^ we introduce multi-bit constraints and 
show how they improve over previous results. Finally, in Section 0 we describe 
quartet constraint to study boomerang attacks, and show that they can detect 
incompatibilities in several attacks. 

2 Analysis of S-systems 

Since ARX systems in general are hard to analyze, we first study systems without 
rotations. An important remark is that a system of Additions and Xors, can be 
seen as a T-function mg, or more precisely, as an S-function |1 !ij . We use the 
following definitions: 

T-function. A T-function on n-bit words with k inputs and l outputs is a 
function from ({0, l}") fc to ({0, l} n ) 1 with the following property: 

For all t, the t least significant bits of the outputs can be computed 
from the t least significant bits of the inputs. 

S-function. An S-function on n-bit words is a function from ({0,l}") fc to 
({0, 1}") ( , for which we can define a small set of states S, and an initial 
state ,S'[— 1] 6 S with the following property: 

For all t, bit t of the outputs and the state 5[t] £ <S can be computed 
from bit t of the inputs, and the state 5[t — 1] . 

In practice, our analysis will be linear in the number of states, and the number 
of states can be exponential in the size of the system. We can only study systems 
with a limited number of states. 

For instance, the modular addition is an S-function, with a 1-bit state corre- 
sponding to the carry. An S-function can also include bitwise functions, shifts to 
the left by a fixed number of bits, or multiplications by constants. However, a 
shift to the left by i bits, or multiplication by constant of i bits, leads to an in- 
crease of the state by a factor of 2*, so the analysis will only be practical for small 
values of i. Note that the multiplication of two variables, or a data-dependant 
shift to the left, are T-functions, but are not S-functions because the size of the 
state has to grow with n. 

1 http : // www . di . ens . fr /"leurent/ arxtools . html 
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In this work, we consider systems of the form f(P, x) = 0 where / is an S- 
function, P is a vector of p parameters, and x is a vector of v unknown variables. 
This defines a family of systems, and we are interested in properties of the set 
of solutions of the unknown x for a given P. We call such a system an S-system. 
A simple and yet important example is the system 

x © 4 = x ffl 6 (1) 

where the parameter are A, S. Solving this system is equivalent to finding a pair 
of variables with a given modular difference and a given xor difference, and was 
an important part of a recent attack on BMW sa- 
lt is well-known that those systems are T-functions, and can be solved from the 
least significant bit to the most significant bit. However, the naive approach to 
solve such a system uses backtracking, and can lead to an exponential complexity 
in the worst case0 

2.1 Representation of S-systems Using Finite State Machines 

A more efficient strategy is to use an approach based on Finite State Machines, 
or automata: any system of such equations can be represented by an automaton, 
and solving a particular instance take time proportional to the word length. This 
kind of approach has been used to study differential properties of S-functions 

in |U2|. 

The first step to apply this technique is to build an automaton corresponding 
to the system of equations. The states of this automaton correspond to the states 
of the S-function in S, i. e. the carry bits: a system with s modular additions 
gives an automaton with 2 s states. The alphabet of the automaton is {0, 1} P+V \ 
each transition reads one bit from each parameter and each variable, starting 
from the least significant bit. The automaton just accepts ( P , x) if and only if 
f(P,x) = 0. 

We can then count the number of solutions to the system by counting paths 
in the graph corresponding to the automaton. In this work we mainly use this 
technique to decide whether a system is solvable, but we can also compute a 
random solution, or enumerate the set of solutions. 

If the S-system is given as an expression with additions and bitwise Boolean 
operations, the transition table of the automaton can easily be constructed by 
evaluating the expression for every possible state, every possible 1-bit parameters, 
and every possible 1-bit variable. 

Decision Automaton. When we remove the information about the variables 
from the edges, we obtain a non-deterministic automaton which can decide 
whether a system is solvable or not, i. e. whether there exists a choice of the 
variable x so that f(P,x) = 0 for a given P. We can then optionally build an 
equivalent deterministic automaton using the powerset construction. 

2 e.g. to solve the system x © 0x80000000 = x, the backtracking algorithm will try all 
possible values for the 31 lower bits of x before concluding that there is no solution. 
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Implementation. We have automated the construction of the FSM from a 
simple description of the S-system. Our tool can deal with any system of addi- 
tions, and bitwise Boolean functions. For instance System © will be written 
as V0~P0==V0+P1, and System © will be written as P0 1 VO I VI; PI | VO I ~V1 ; 
P2 1 ~V0 1 VI ; P3 1 ~V0 1 ~V1. The variables are denoted by Vi and the parameters 
by Pi, and the operations are written naturally with a C-like syntax. The tool out- 
puts the transition table of the automaton, and we have a collection of function 
to compute properties of the system from this table. From the FSM representa- 
tion of an S-system, we can automatically derive: 

— Whether a given set of parameter leads to a compatible system 

— A random solution when the system is compatible 

— The number of solutions (and the probability that a random x is a solution) 

— A description of the solution set, from which we can efficiently iterate over 
the solutions 

3 Study of Differential Characteristics 

The most basic approach to describe a differential characteristic is to choose 
a difference operation (usually the modular difference B or the xor difference 
©), and to specify the difference x' — x for every internal variable of a cipher. 
One can compute the probability of reaching the specified output difference for 
each operation, and the probability of the full characteristic is computed by 
multiplying the probabilities of each operation, under the assumption that the 
probabilities are independent. 

However, this approach is not very successful for ARX designs, because the 
assumption of independence is very often false. To overcome this, Wang et al. 
introduced the notion of a signed difference. For each bit, we now consider three 
different possibilities: 

— 35 M = x'^ , this is denoted as 0; 

— s; W = 0, = 1, this is denoted as +1; 

— = 1, as'M = 0, this is denoted as —1. 

This gives much better results in the presence of modular addition, because it 
combines both the modular difference and the xor difference. 


3.1 Generalized Constraints 

More generally, de Canniere and Rechberger noted that we can define a difference 
characteristic by allowing certain subsets of the values of (x, x') for each bit of 
the cipher p]. 

Table Q shows the symbol they use to denote all the possible subsets of 
■p({0, l} 2 ). For a given internal state variable x, and a constraint A, we write 
5(x,x') = A — or Sx = A if there is no ambiguity — to means that (x, x r ) is 
restricted to the subset defined by A. 
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Table 1. Constraints used in 0 Table 2. Trivial encoding 
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Since the definition of 6 only involves bitwise operation, we can write it as an 
S-system, if we encode A as shown in Table El 

P 0 = 0 =► (x, x') ± (0,0) Pi = 0 =► (x,x') ± (0,1) 

Pi = 0 =*■ fax') ? (1,0) P 3 = 0 =► fax') ± (1,1) 
or equivalently: 

PoVxVx' PiVxVx' P 2 \/x\/x' P 3 ViVi'. (2) 

3.2 Differential Characteristics 

In order to describe a differential characteristics with this framework, we specify 
a difference for each internal variable of a cipher, and we consider the operations 
that connect the variables. For each operation 0, we can write an S-systen0: 

8x = A x Sy = A v 5z = A z z = xQy z' = x'Oy', (3) 

where x, y, z, x' , y 1 , z' are unknowns, and A x , A y , A z are parameters. Using this 
S-system, we can verify if the differences specified input and output patterns 
for each operation are compatible. Moreover, we can compute the probability to 
reach the specified output pattern by counting the number of solutions. Assuming 
that the probabilities of each operations are independent, we can compute the 
probability of the full characteristic by multiplying the probabilities of each 
operations. We deal with the rotations y = x i by just rotating the constraint 
pattern: if 8x = A x then we use Sy = A x i. 

3 We assume that all the operations except the rotations are S-function, as is the case 
in ARX designs. 
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3.3 Propagation of Constraints 

This approach can also be used to propagate the constraints associated with a 
differential characteristic. The main idea is to consider each bit constraint, and 
to split it into two disjoint subsets; if one of the subsets result in an incompatible 
system, we known that we can restrict the constraint to the other subset without 
reducing the number of solutions. More precisely, we use the following splits for 
the 1-bit constraints of : 

? -/x, 3/C, 5/A, 0/E, 1/7, u/D, n/B ^ 0/1 x ^ u/n 

3 — > 0/u C — » 1/n 5 — > 0/n A^l/u 

7 — > 0/x,u/5,n/3 B — » 0/A,u/-, 1/3 D — » 0/C,n/-, 1/5 E — » u/C,n/A, 1/x 

For instance, if a bit is specified as ?, we test whether the system is still com- 

patible when it is restricted to - and to x, respectively. If one of the systems 
becomes incompatible, we can turn the ? constraint into x or -, accordingly. If 
both are still compatible, we then try to restrict the ? bit to 3 and C, and try 
all the available splits. 

This will be repeated with the S-systems corresponding to each operation in 
the cipher. We can not apply this strategy to bigger chunks because the resulting 
system would be too large. Still, the constraints found in one system will be 
given as input to other systems involving the same variable, and can generate 
new constraints. The technique will discover necessary constraints, and output 
a characteristic more precise than the input characteristic. 

This can also be combined with more global techniques such as Section 2.3 
of 0 . or the “Complete Condition Check” of HU- When we a constraint is split 
into two subsets, we can look for contradictions by running the propagation 
algorithm on the full path, instead of running it on a single operation. However, 
this becomes very expensive for large systems and it can take hours to try to split 
each constraint. In this work, we focus on discovering local conditions efficiently, 
and we leave the analysis of less local techniques for future work. 

All this can be implemented quite efficiently using automata to solve S-systems. 
If we build deterministic decision automata, we can test whether a system is com- 
patible with only n table access. This approach is very similar to the technique 
used in jSj, and explained in more details in eb and HEJ. The main difference 
is that we iterate over all possible choices for the variables only when building 
the automaton, not when using it. In previous work, a similar result is achieved 
by caching the results of the computations. 

4 Multi-bit Constraints 

In this work, we extend this framework by considering constraints on several 
consecutive bits, instead of strictly bitwise constraints. This allows to express 
some conditions that occur naturally when considering carry extension, such as 
x [i) _ Two-bit conditions have already been proposed in ca, but they are 

treated separately from the main characteristic. In particular, two-bit conditions 
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Table 3. New 1.5-bit constraints 


(x®x',x® 2x,x): (0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1) 


! x' = x £ 2x 
< x’ ^ x = 2x 
> x'^x^2x 


are not used to deduce further constraints through the propagation algorithm. In 
our work, multi-bit constraints can only deal with consecutive bits of a variable, 
but they are part of the characteristic, and they can be propagated efficiently. 


1.5-bit Constraints. First, we consider constraints on pairs of consecutive bits. 
Intuitively, this is used to capture the fact that in a carry chain, even if we don’t 
known the sign of the modular difference, we know that the active bits all have 

the same sign, except the final one. For instance, if we have — xEB > -xxx, 

we know that output difference must be either -nuu (if the input difference is 

n) or -unn (if the input difference is u). We can capture this behaviour 

using constraints that link the sign of an active bit to the sign of the previous bit. 
In our implementation, we introduce a set of 16 constraints described in Table 01 
andQJ ?, -, x, 0, u, n, 1, #, 3, C, 5, A, =, ! , <, >. For instance, the symbol < means 
that the current bit is active, and that bit i of x is equal to bit i of 2x, i. e. to 
bit i — 1 of x — this can be written as x'^ jfc = .r^ 'l , and it appears in the 
middle of carry chain. The situation of a carry extension with an unknown sign 
as in xffl > -xxx can now be written more accurately as -><x. 

The constraints of Table 01 are written as subsets of (x^x'^xl® -1 !); we call 
them 1.5-bit constraints because we use x^® -1 ! but we do not use x'l® -1 !. 


2-bit Constraints. The 1.5-bit constraints are quite efficient to capture in- 
formation about the carries when the xor difference is known. However, when 
the xor difference is not known a priori , we still loose a lot of information. To 
overcome this problem, we considered the full set of 2 16 possible constraints 
on (x^x'^x^^jx'I® -1 !), and we discovered an important property: they can 
restrict the pair (x, x') to exactly the set of values with any given modular dif- 
ference. More precisely, this is achieved using the 10 constraints of Table We 
found this set of constraints experimentally, by testing all 8-bit differences. 

This is an important result because it allows to express the modular difference 
using only local constraints. Local constraints can easily go through rotations, 
and can be expressed as S-functions. Therefore we can compute the probability 
of a differential characteristic expressed in this way, and we can propagate these 
constraints automatically. 

We denote the first four constraints as U, V, N and M; the remaining six can be 
obtained by combining previous constraints. The most important constraints are 
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Table 4. 2-bit constraints sufficient to describe exactly the modular difference 
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Table 5. New 2.5-bit constraints 
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the one denoted as U and N: they can capture the carry extension of a positive 
(resp. negative) modular difference. For instance a modular difference of +1 can 

be realized with 4-bit words as u, --un, -unn, unnn or nnnn, depending on 

the carry extension. For each of the potential carry bits (1-3), we can see that 
the difference pattern of bit i and bit i — 1 is always one of -u, un, or nn. 
Reciprocally, if bits 1-3 follow these patterns, then the full difference has to 
be one of the previous patterns, and the modular difference will be +1. The U 
constraint correspond to these patterns. 

In our implementation, we only use the U and N constraints, which are sufficient 
to express sparse modular differences. 


2.5-bit Constraints. To obtain an efficient technique to study differential char- 
acteristics in ARX constructions, we want to combine the results of the 1.5-bit 
constraints, and the 2-bit constraints. On the one hand, the 1.5-bit constraints 
are constructed from the 1-bit constraints in order to capture information about 
the carry when the sign of the difference is not known. On the other hand, the 2- 
bit constraints can capture exactly the modular difference, but we need to know 
the sign of the difference. We now introduce constraints to capture the modular 
difference when the sign is not known. 
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Table 6. Comparison of the constraint sets. We show how simple difference sets can 
be encoded with our constraints, and the number of pairs allowed by each constraint. 


Diff, carry 

1-bit cstr. 

1.5-bit cstr. 

2-bit cstr. 

2.5-bit cstr. 

+1, fc-bit (2 n ~ k ) 

-unnn (2~- fc ) 

-unnn (2 n ~ k ) 

-unnn (2"“ fc ) 

-unnn (2 n ~ k ) 

±1, fc-bit (2 n - fc + 

*) -xxxx (2") 

->«x (2 n-fc+] 

) ->«x(2"- fe+1 ) 

->«x (2 n ~ k+1 ) 

+1, any (2 n ) 

????x (2 2n_1 

) ????x (2 2n ~ 1 ) 

UUUUx (2") 

UUUUx (2 n ) 

±1, any ( 2 n+1 ) 

????x (2 2 "- 1 

) ????x (2 2n_1 ) 

XXXXx (2 n x n) 

///Xx (2" +1 ) 


Following the analysis of the 2-bit constraints, we study the patterns created 
by a carry extension with an unknown sign. Using the 1.5-bit constraints, we can 
see that the constraints of bits i and i — 1 are either ->, or x<. Reciprocally, if 
all the bits follow these patterns, this result in a valid carry extension. We denote 
the corresponding set of possibilities for as / 

As shown in Table 0 we introduce the following new constraints: X = 
{ — , —x, xx}, U = {--, -u, xn}, N = {— ,-n,xu}, / = x<}, \ = {-<,x>}. 

For efficiency reasons, we keep a set of only 16 constraints by removing the less 
useful ones: ?, -, x, 0, u, n, 1, =, !, <, >, X, U, N, /, \. 

4.1 Comparison 

To compare the sets of constraints, we show how they can be used in simple 
situations in Table El We consider 4 situations, were we describe a set of pairs 
with a modular difference of ±1: 

— First, we assume that we know the sign of the difference, and the length of 

the carry ( e.g . uEB > -xxxxx). In this case all the constraints 

systems give an optimal characterization of the set of allowed pairs. 

— Second, we assume that we don’t known the sign of the difference, but we 

know the length of he carry (e.g. xEH > -xxxxx). In the case, 

we need constraints on 1.5 bits to optimally capture the relations in the 
carry-extended bits. 

— Third, we assume that we know the sign of the difference, but we don’t know 

the length of the carry (e.g. uEH > ??????). In this situation, 

the 2-bit constraints can express precisely the modular difference. 

— Finally, we assume that we don’t know the sign of the difference, nor the 

length of the carry (e.g. xEB ► ??????). Here, we need con- 

straints on 2.5 bits to restrict the set of pairs optimally using relations be- 
tween the bits. 

4.2 Use as S-sytems 

We also denote the new sets of constraint by 8. Since the definition of 8 only 
involves bitwise operation and left-shift by a few bits, 8x = A can by written as 
a S-system, similar to System fl2J). We can use the tools of Section^ to compute 
the probability of a characteristic specified with the new constraints, and to prop- 
agate the new constraints, by build the automata associated with the systems 
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of each operation, as given in Q. These automata are quite large, because the 
state of the automata has to include the values of x d* -1 !, and x^~ 2 \ 

In practice, we implemented the 1.5-bit constraints and the 2.5-bit constraints. 
With the 1.5-bit constraints, we have 5 bits of state for the S-system of the 
addition, but the transition automaton only reaches 16 different states. When 
using the powerset construction to build a deterministic decision, we obtain 
12929 states, and the full table takes 102MB. With the 2.5-bit constraints, we 
have 11 bits of state, and the transition automaton reaches 160 different states 
(we cannot build a deterministic decision automaton in this case). 

We could easily include more constraints in our framework, but this set of 
symbol is quite expressive, and a larger set of constraints would result in larger 
tables. We will see that those constraints give good results in practice. Moreover, 
we note that many cases can be expressed using the constraints of two consecutive 
bits. For instance, the constraint = x'^ = a;b _1 ] =0 cannot be expressed in 
Table 0 but it will be coded with constraint = for bit i, and constraint 3 for bit 
i — 1 (if some more information is known for bit * — 1, it will become 0 or u). 

When we deal with a rotation, we have to relax the constraints slightly if the 
multi-bit constraints are broken by the rotation. For a rotation of i bits to the 
right, if is one of =, ! , < or >, it will be relaxed to -, -, x and x, respectively. 


4.3 Propagation of Constraints 

To propagate the new constraints, we need to define how to split the new con- 
straints. We use the following splits for the 1.5-bit constraints: 

? -* -/x, 3/C, 5/A > 0/1, =/! x — > u/n, </> 

3 — > 0/u C — > 1/n 5 — > 0/n A->l/u 

= —>0/1 !— >0/1 > — > u/n < — > u/n 

For the 2.5-bit constraints, some useful subsets are not included in the 16 con- 
straints, but can be obtained by restricting both A$ and Ax~^- We use the 
following splits: 

? — > -/x, X/x- X ->U/Nx,N/Ux,-/xx,//\ N — > -/xu 

\ — > — </x> / ^U/Nx,N/Ux,-/x< U — > - /xn 

This approach is quite efficient. As an example, let us consider this system: 

8x = x--x 6y = z = xHly 

5u= — x 8v = z=uBv 5z = -???. 

It is easy to see that this system is incompatible when considering modular 
differences: the difference in x ffl y is ±8 ± 1, while the difference in u ffl v is ±1. 
However, when using only the xor difference, or the constraints of [0|, this system 
seems to be compatible, and constraint propagation gives 8z = -xxx. Using our 
new constraints, the algorithm can further deduces Sz = -«x from the first 
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Table 7. Experiments with a few rounds of a 4-bit Skein. We give the number of 
input / output differences accepted by each technique, and the ratio of false positive. 


Method 

4 rounds (total: 2 32 ) 

6 rounds (sparse 1 ) 

Accepted 

Fp. 

Accepted 

Fp. 

Exhaustive search 

35960536 (2 25 1 ) 

0 

427667 (2 18 - 7 ) 

i 0 

2.5-bit constraints 

40820032 (2 25 3 ) 

0.13 

746742 (2 19 - 5 ) 

i 0.7 

1.5-bit constraints 

40820032 (2 25 3 ) 

0.13 

1372774 (2 20 ' 4 ) 

i 2.2 

1-bit constraints 

43564288 (2 25 4 ) 

0.21 

1762857 (2 20 ' 7 ) 

1 3.1 

Checking additions independently 

56484732 (2 25 8 ) 

0.57 




1 Weight 4 differences. The total number of input /output differences is 


addition and Sz = -><x from the second addition, and the incompatibility is 
detected. Moreover, the incompatibility can be detected without specifying the 
difference in z beforehand using the 2.5 bit constraints. 


4.4 Comparison with Previous Works 

To compare the efficiency of the constraints, we did some experiments with 
reduced versions of Skein. We test a set of input and output xor differences, and 
we compare several methods to detect if the differences are compatible. We use 
small versions so that we can find exact results with exhaustive search. We verify 
that no false-negative are found, and we compare how many false-positive are 
found with each technique. 

First we use a reduced Skein with two rounds and 4 words of 4 bits each. 
We note that for a two-round Skein, all the intermediary xor difference can be 
computed from the input and output xor differences; therefore we have a full 
xor differential characteristic. As a reference point, we can check whether each 
non-linear operation has a non-zero probability. Our result in Table 0 show that 
the assumption of independence of the operations can be quite flawed: we found 
many paths where each operation has a non-zero probability, but no pair can 
satisfy the differential. This motivates the use of more advanced constraints in 
order to extract information from one operation and combine it with another 
operation. We also see that our 1.5-bit constraints can detect more problems 
that the 1-bit constraints of j^. In this setting the 2.5-bit constraints are no 
better than the 1.5-bit constraints because the xor differences are all known. 

We also did experiments with a reduced Skein with three rounds and 4 words 
of 6 bits each. We only use sparse differences (less than 4 active bits in the input 
and output), because the full space is too large to be exhausted in practice. Our 
results are given in Table 0 and show that in this setting, the 2.5-bit constraints 
reduces the number of false positives threefold over the 1.5-bit constraints. The 
2.5-bit constraints provide much better results than previous works when the 
xor difference is not known beforehand. 
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4.5 Description of Some Case of Incompatibility 

We have developed a graphical tool that can display such a characteristic, and 
allows the user to easily modify the characteristic by adding and removing con- 
straints. The tool can automatically propagate the new constraints, and show 
incompatibilities if there are some. 

We have studied published differential trails with this tool and we found prob- 
lems in several of them. It seems that many characteristics following a natural 
construction, and seemingly valid when verified manually, are in fact incompatible. 
We will now describe some of the patterns that can lead to unexpected problem. 

Problems with Modular Addition. A simple class of problems is related to 
the modular additions when using xor differences. Techniques to check the valid- 
ity of these operations are well known f 1 31 1 !)| , but in some cases the results are 
somewhat unexpected. In particular, the valid differences are quite constrained 
in the least significant bit, because the incoming carry is fixed to zero. For in- 
stance the following path is built with a simple linearization, but it is in fact 
incompatible: 

5a = — x 5b = — x 5c = — x 

x = afflifflc 5x = — x. 

More generally, some pattern which seem valid when studied with a signed differ- 
ence are in fact incompatible. The characteristic used in a recent near-collision 
attack against Skein J21| contains a pattern similar to this oikQ: 

5a = — xxxxx- 5b = xx 

x = a EB b 5x= -xxxx-x-. 

This seems valid when considering signed differences: the difference should be 
±2 in a, ±8 in b, and ±2 ±8 in x. In fact, this does not have any solution, and it 
does not seem easy to modify the characteristic of (2S| to obtain a valid attack. 

Problems with Carry Extensions. Carry extensions in modular additions 
generate constraints between consecutive bits which can be detected with our 
framework. For instance, let us consider the following simple path: 

5a = -xx — c = a EE 6 c' = c^> 2 u = d EH d 

5b = xxx — 5c= 5d= — xx- 5u= — xx-. 

The first addition generate a constraint cM ^ (*. e. 5c = - ! ), and the 

second addition generate a constraints dW = c'M (i. e. 5d = — =--). Obviously 
these constraints are contradictory through the rotation. In this example the 
problem will be detected by our new constraints, but not when looking at each 
operation individually, or using the single-bit constraints of j[$. 

4 This can be found at round 20, in the addition C20 = C19 EB dig, with the following 
xor-differences: Aciq = 0x020030a0000f 80a0, Adig = 0xf8f87ca007f 7c7a7, Ac 20 = 
0x7ef8f 50001104501. 
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5 Constraints for the Analysis of Boomerang Attacks 

We also study differential characteristics in the context of boomerang attacks. 
The traditional approach is to specify each characteristic separately, and to as- 
sume that they are all independent. In this work, we consider a boomerang 
characteristic mostly as collection of constraints for the top characteristics and 
the bottom characteristics. 

Let x be some internal state variable, and a;® , a: 1 - 1 ), xS' 2 \ a:® be the cor- 
responding variables in a boomerang quartet. A boomerang property is built 
by specifying a top trail for (a;®, a;®) and (a:® , a:®), and a bottom trail for 
(a?®, a:®) and (*® , a;®). For more generality, we allow the two characteristics 
to be different in each case ( e.g ., the signs might be different). The top trail will 
be mostly unconstrained for the bottom part of the cipher, while the bottom 
trail will be mostly unconstrained for the top part. 

Unfortunately, the hypothesis of independence might be wrong in practice, 
and we can find paths that are impossible to satisfy simultaneously, as shown 
by Murphy EDI- In fact, this kind of problem seem to be quite common with 
ARX designs, as shown in the case of HAVAL j22], SHA-256 0, or Skein EH 
To capture this kind of dependency, we use constraints on quartets of variables, 
instead of constraints on pairs of variables. We can not use the full set of 2 16 
constraints, because the resulting system is too large, but we use a set of 81 
constraints given in Table |S| to specify the xor difference in each of the four sides 
of the quartet. For (i, j) in {(0, 1), (2, 3), (0, 2), (1, 3)}, we restrict a;®©*® to 0 or 
1, or leave it unrestricted. Note that some constraints are actually contradictory^ 
or redundantly, but this uniform set is much easier to work with than a reduced 
set without the extra constraints. 

We use three different kinds of S-systems to propagate constraints in a 
boomerang characteristic: 

1. systems with multi-bit constraints and non-linear operations in each individ- 
ual path, following System © (for (i.j) in {(0, 1), (2,3), (0,2), (1,3)}): 

5(a;®,a;®) = A« % (i) ,|/®)^A« <5(z®,z®) = A^ 

s® EH = z® a;® EH y® = z®j 

2. systems with quartet constraints and non-linear operations: 

*(u (0) ,u®,« (a) ,«®} = A 3,1 ’ 2 ’ 3 , for all u in {x,y,z} 

x (i) y (i ) = iZ W j £ or a ii i in { 0 , 1, 2, 3} 

3. systems with multi-bit constraints linking the four variables of a quartet: 

5(a;®, ie®) = A 3 ’ 2 <J(a;®, a;®) = A}’ 3 (Top path) 

S(x^°\ a;®) = A 3 ’ 1 <5(a;®, a;®) = A 2 ’ 3 . (Bottom path) 

6 e.g. —x means ® x « = 0, x® ® x® = 0, x™ ® = 0 , x « ® x< 3 > = 1, 

which is impossible. 

6 e.g. — ? and allow the same values for the x^’s. 
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Table 8. New boomerang constraints 
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(xW,yW;^ 3 \yW] 


Fig. 1. Example of incompatible characteristics 


5.1 Incompatibility in Boomerang Characteristics 

We found that some very simple patterns can lead to incompatibilities. Figure [I] 
gives an example of a pattern that results in incompatible characteristics. If a 
quartet follows these characteristics, the middle bit of the variables has to satisfy: 

® x^ = x W ® a;® = 1 ® y ^ = y W ® y& = o (Top path) 

ar® ® ® x ^ = 1 ® y^ = y^ ffl y ® = 1 (Bottom path) 

x (0) ffl y(G) „ 2.(1) m 2,(1) 2.(2) m y( 2) = 2.(3) y (3) 

We can assume that = 0, and deduce xW = 1, x^ = 1, = 0. Since the 

difference in (y(°\ y^) must cancel the difference in (x^°\ x^), we have yW = 1, 
= 0, and we can deduce y ^ = 1, = 0. But the difference in (y^ 2 \y^) 

can not cancel the difference in (x^ 2 \x^). A more detailed analysis shows that 
this pattern can lead to incompatibilities even if we allow some incoming carries. 

This pattern seem to appear very frequently when using linearized character- 
istics in ARX designs. 


5.2 Application 

We used our tools to verify several boomerang attacks in the literature, and 
found some attack using incompatible paths. 
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Blake-256. First, we studied the boomerang attacks on Blake from Biryukov 
et.al in (3j- When looking at the paths used for the attacks on 7 and 8 round of 
the keyed permutation, our tool detects an incompatibility. More precisely, when 
starting from a middle quartet with the specified differences, and going backward 
through G, 3 , it is impossible to get the specified difference simultaneously in both 
paths. We verified experimentally that we could not find such quartets, even with 
significantly more trials than predicted under the assumption that the paths are 
independent. 

With the help of the authors of 0, we found out an alternative path that 
give a valid boomerang attack. More precisely we modify the top path by using 
a difference on bit 25 instead of 31, and rotating all the difference patterns. We 
verified experimentally that this leads to a valid attack, but the cost of the attack 
becomes higher than reported in j3J. 

Similarly, for the compression function attacks, our tool detects that the path 
used for the 6.5 and 7-round attacks is invalid. We found that this can corrected 
by modifying the top path to use differences on bits 4 and 20 instead of 15 
and 31. 

Skein-512. We also used our tool to study the boomerang attacks on Skein. We 
start with only the linearized (or almost linearized) xor differential characteristics 
for rounds 12-16 and 16-20, with the key addition in between to provide extra 
freedom, and we use our tool to propagate the constraints. We found that the 
following paths lead to contradictions: 

— The paths for the 32-round attack of |S| ; 

— The paths for the 33- and 34-round attack of (51; 

— The paths for the attack of IH. based on the old rotation constants, and 
inverse permutations; as well as a modified version using the correct permu- 
tations. 

In each case, our tool detect the contradiction automatically. More recently, a 
new path has been proposed j50|> and a middle quartet was given to show that 
the paths are compatible. 

6 Conclusion 

In this paper, we study differential characteristics in ARX constructions. We 
extend the framework of de Canniere and Rechberger with new constraints. 
First we introduce multi-bit constraints that can be propagated more accurately 
through modular addition. We show that a set of 2-bit constraints can express 
exactly the modular difference of a pair of variables, and describe a reduced set 
of 2.5-bit constraints that can express the modular difference in simple cases 
and can also capture the carry extensions of an unsigned difference. Second, we 
introduce new quartet constraints to work with boomerang attacks. 

We provide experimental results showing that our constraints can automati- 
cally detect several cases of incompatibility in differential characteristics unde- 
tected by previous techniques; and we point out several published attacks that 


242 G. Leurent 


turn out to be invalid. We show that some paths can in fact be incompatibile; 
this shows the importance of verifying differential attacks. 

We hope that the tools will be useful to other cryptanalysts, and they are 
available at http : //www . di . ens . f r/~leurent/ arxtools . html. 
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Abstract. Zero-correlation cryptanalysis uses linear approximations 
holding with probability exactly 1/2. In this paper, we reveal fundamen- 
tal links of zero-correlation distinguishers to integral distinguishers and 
multidimensional linear distinguishers. We show that an integral implies 
zero-correlation linear approximations and that a zero-correlation linear 
distinguisher is actually a special case of multidimensional linear dis- 
tinguishers. These observations provide new insight into zero-correlation 
cryptanalysis which is illustrated by attacking a Skipjack variant and 
round-reduced CAST-256 without weak key assumptions. 

Keywords: zero-correlation cryptanalysis, integral distinguishers, mul- 
tidimensional linear distinguishers, Skipjack, CAST-256. 


1 Introduction 

1.1 Zero-Correlation 

Zero-correlation cryptanalysis [3 IB! is a novel promising attack technique for 
block ciphers. The distinguishing property used in zero-correlation cryptanalysis 
is the existence of zero- correlation linear approximations over (a part of) the 
cipher. Those are linear approximations that hold true with a probability p 
of exactly 1/2, that is, strictly unbiased approximations having a correlation 
c = 2p — 1 equal to 0. 

The original work |ZJ provides a simple and efficient technique to find zero- 
correlation approximation but the distignuisher was rather weak. Recently, the 
work has proposed a more powerful distinguisher by exploiting the fact that zero- 
correlation approximations are numerous in susceptible ciphers. Though working 
fine in practice and being useful in cryptanalysis, the distinguisher of 0 has some 
constraints that we would like to overcome: (1) If there are l zero-correlation lin- 
ear approximations for an n-bit block cipher, the distinguisher of jB| has to make 
(9(2" /VI) queries. So the data complexity does not go down as fast as l grows. (2) 
The distinguisher of |B| relies on the assumption that all linear approximations with 

* All authors are corresponding authors. 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 244-g5T] 2012. 
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Integral and Multidimensional Linear Distinguishers with Correlation Zero 245 


correlation zero are independent. In most cases, including the attacks of [Bj in fact, 
this assumption is formally not met, since all classes of zero-correlation approxi- 
mations known so far are actually truncated, building linear spaces of dimension 
log 2 1. That is, almost all t approximations used will be linearly dependent, for- 
mally jeopardizing the assumption and another theory is needed to support the 
zero-correlation. 


1.2 Our Contributions 

Zero-Correlation and Integrals. Integral distinguishers were originally pro- 
posed by Knudsen as a dedicated attack against the Rijndael-predecessor Square 
02! ■ Integral distinguishers PU are also known as square distinguishers for this 
reason, especially when applied to Square-type ciphers such as AES. Variants of 
integral distinguishers include saturation m and multiset distinguishers jS| • In- 
tegral distinguishers mainly make use of the observation that it is possible to fix 
some parts of the plaintext such that specific parts of the ciphertext are balanced, 
i.e. each possible partial value occurs the exact same number of times in the output. 

In this paper, we demonstrate that an integral implies zero-correlation lin- 
ear approximations, see Fig. Q In the other direction, a zero-correlation distin- 
guisher implies an integral distinguisher only if input and output linear masks 
in zero-correlation approximations are independent of each other. Note that the 
condition for the input and output masks to be detached from each other im- 
plies that, for instance, the 5-round zero-correlation property of balanced Feistel 
ciphers of 0 is not directly described by an integral. 

In this sense, the fact the integrals imply zero-correlation distinguishers is 
especially intriguing as not only the ways the distinguishers are constructed are 
different but also the ways the resulting attacks work seem inherently different. 
In particular, this link allows using l input masks and one output mask with 
correlation zero in a distinguisher with a data complexity of 2 n /i. Thus, in these 
settings the above outlined link allows to reduce the data complexity of zero- 
correlation distinguishers by a factor of \ft (at the price of transforming the 
attack into a chosen-plaintext attack) compared to previous works. 

Zero-Correlation and Multidimensional Linear Distinguishers. The ba- 
sic idea of multidimensional cryptanalysis |TllTl lTTIin~ r )llT7irTT?| is that, given corre- 
lations of all linear approximations with non-zero correlation on a linear space 
formed by some cipher data, the probability distribution of the cipher data can 
be determined. Then, instead of the statistical behavior of a large set of mu- 
tually dependent linear approximations, one can examine the data distribution. 
Indeed, statistical behavior of multiple linear approximations has been analyzed 
only under the assumption of statistical independence 0. The main advantage 
of the multidimensional approach is that it allows rigorous statistical analysis 
of linear approximations without the independence assumption. In traditional 
linear cryptanalysis, the focus is on linear approximations with correlations of 
large magnitude. The larger are the magnitudes of correlations, the more non- 
uniform is the distribution of the cipher data under consideration. The linear 
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distinguisher is then based on distinguishing the nonuniform cipher data distri- 
bution from an uniform distribution. For a more comprehensive recent survey 
on multidimensional linear distinguishers, the reader is referred to e.g. [Ki- 
ln this paper, we consider linear spaces of cipher data where correlations of all 
linear approximations are equal to zero. Our starting observation here is that in 
fact, being truncated, zero- correlation approximations constitute a special case 
of multidimensional linear approximations. However, unlike traditional multi- 
dimensional linear distinguishers where the cipher data behaves non-uniformly, 
the cipher data for zero-correlation is uniformly distributed. This requires the 
development of a statistical theory to distinguish a sample of such cipher data 
from a sample of random data drawn from an uniform distribution. 

In contrast to |Bj , the new distinguisher does not need the assumption of the sta- 
tistical independence for multiple zero-correlation linear approximations. While 
still requiring about 0(2"/ VI) cipher queries, it allows taking full advantage of all 
zero-correlation linear approximations available, independent or not. The distri- 
bution of the cipher data is accurately modeled as sampling from a multivariate 
hypergeometric distribution, while the random data is drawn from a multinomial 
distribution. This establishes an inherent link of zero-correlation to multidimen- 
sional linear distinguishers. In their essence, zero-correlation distinguishers consti- 
tute a special case of multidimensional linear-correlation distinguishers, see Fig. [I] 
We expect this technique to be useful in the cryptanalysis of many ciphers. 



Fig. 1 . Relations among distinguishers: zero-correlation, integral, statistical saturation, 
and multidimensional linear 

Applications: Attacks on Skipjack Variant and CAST-256. To empha- 
size the practical meaningfulness of our findings, we apply the new distinguishers 
to mount key recovery attacks on block ciphers. 

Skipjack is the only block cipher known to be designed by NSA. It is a 32- 
round 4-line unbalanced Feistel-type network based on interleaving two types 
of round functions - Rule A and Rule B. The best known cryptanalytic result 
for Skipjack is the impossible differential cryptanalysis for 31 rounds given by 
Biham et al. |2j based on a 24-round impossible differential. We change the 
order of Rules A and B in Skipjack such that the longest impossible differential 
identified is over 21 rounds and show that it has a 30-round zero-correlation 
property. We can recover its key for 31 rounds with practical complexity using 
an integral zero-correlation attack. 
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CAST-256 was proposed as an AES candidate. It has 48 rounds. The best 
cryptanalysis so far in the classical single-key model without the weak-key 
assumption has been a linear attack on 24 rounds. We find 24-round zero- 
correlation linear approximations for CAST-256 and attack 28 rounds of CAST- 
256 using multidimensional zero-correlation cryptanalysis. At the same time, the 
longest impossible differential we are aware of is over 18 rounds (though there is 
an unspecified impossible differential for 20 rounds mentioned in the literature) . 
Our multidimensional zero- correlation attack is the first attack on more than half 
of the full-round AES-candidate CAST-256 without the weak key assumption. 

The remainder of the paper is organized as follows. In Section 0 we intro- 
duce some basic concepts and notions which will be useful throughout the paper. 
Section [3 establishes a strong link between the properties of integrals and zero- 
correlation approximations. Using an integral zero-correlation distinguisher, Sec- 
tionEJcryptanalyzes a Skipjack variant resistant to impossible differential attack. 
Sectional describes a link of zero-correlation approximations to multidimensional 
linear approximations and introduces a novel zero-correlation multidimensional 
linear distinguisher. Section 0 uses it to recover the key of 28 rounds of CAST- 
256. We conclude in Section 0 

2 Preliminaries 

2.1 Linear Approximations and Balanced Functions 

F 2 denotes the binary field of two elements and F£ is its extension of dimension 
n. Let x and a € F£ . Then (a, x) denotes their cannonical inner product on F£ . 
Given a function H : FI) — > F* the correlation c of the linear approximation 

(b,H(x)) + (a,x) 

for a fc-bit output mask b and an n-bit input mask a is defined by 
Pr«6, H(x)) + (a, x) = 0) = 

where the probability is taken over all choices of inputs x. A related measure for 
this correlation is the Walsh- or Fourier- transformation, defined as 

H(a,b) = J2(-l) {b ’ Hix)H{a ’ x) - 

The fundamental relation between the Fourier transformation of H and the 
correlation of the linear approximation is given by 



and, thus, studying the correlation and studying the Fourier transformation are, 
up to scaling, equivalent. 
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We say a function F : — J- FJj is balanced if all preimages have identical size, 

i.e. if the size of the set 

f_1 (2/) : = i x e F2 | F(x) = y} 

is independent of y. Note that F being balanced implies k < n. We recall the 
following well-known characterization of balanced functions, see for example fTTil 
Proposition 2]: A function F : is balanced if and only if all its com- 

ponent functions are balanced, that is, if and only if for any non-zero b € it 
holds that F( 0, 6) = 0. 


2.2 Decomposition of the Target Cipher 

Assume that H : — > F£ is a (part of) cipher. To simplify notation and without 

loss of generality we split the inputs and outputs into two parts each. 

Furthermore, the function T\ defined by 

T x :¥ s 2 ^¥ t 2 ,T x (y) = H 1 (X,y) 

will play a key role. The function T\ is the function H when the first r bits of its 
input are fixed to A and only the first t bits of the output are taken into account. 


Table 1. Defining properties of some important distinguishers 


Distinguisher 

Defining property 

multidimensional linear 

53 ai bi H(a, b) 2 non-random 

statistical saturation 

VA : J2b 1 T\(0, &1) 2 non-random 

integral 

VA, bi : f A (0,&i) =0 

zero-correlation 

Vai.&i : H{a,b) = 0 


2.3 Distinguishers and Relations 

Here we briefly outline the concepts behind four types of relevant distinguishers 
that we will be dealing with in this paper, which are also summarized in Tabled 
Zero-correlation distinguisher uses the property that, for all input and output 
masks a = (ui, 0) and b = (61, 0), the Fourier transformation of the cipher yields 
zero, H(a,b ) = 0. Integral distinguisher is based on the property that, for all 
partial input fixations A, the partial function of the cipher with this fixation 
is balanced in parts of its output. Multidimensional linear distinguisher relies 
upon the property that multiple Fourier coefficients of the cipher behave in a 
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non-random way, i.e. ^ 6i H(a,b ) 2 is non-random. Statistical saturation dis- 
tinguisher builds upon the property that, for all partial input fixations A, the 
partial function of the cipher with this fixation is non-random under Fourier 
transformation, i.e. T\(0, bi) 2 is non-random. While statistical saturation 
and multidimensional linear distinguishers concentrate on the cumulative prop- 
erties holding for the partial Fourier spectra, integral and zero-correlation dis- 
tinguishers deal with a set of individual properties of Fourier coefficients. 

3 Zero-Correlation and Integral Distinguishers 

3.1 Conditional Equivalence Result 

We start by stating the main result of this section, which is summarized in the 
following statement: 

Proposition 1. If the input and output linear masks a and b are independent, 
the approximation ( b,H{x )) + (a, x) has correlation zero for any a = (ai, 0) and 
any b = (6i,0) ^ 0 (zero- correlation) if and only if the function T\ is balanced 
for any A (integral). 

This basically means that, at least in terms of their defining properties, integral 
distinguishers imply zero-correlation distinguishers. The proof of Proposition [I] 
follows directly from the two lemmata below whose proofs are provided in the 
full version of this paper jHj . The tools used in the proofs mainly originate from 
results in the area of Boolean functions 1221 For instance, Lemma 0 is stated in 
different notation e.g. in HU Proposition 9]). 

The main technical tool is the next lemma linking the correlation of T\ to the 
correlation of H. 

Lemma 1. With the notation from above, the following holds for any A, b\ : 

rf x (0, b 1 ) = ^(-l)^’ A >R((ar, 0 ), (h, 0 )) ( 1 ) 

Lemma |T| already proves one direction of Proposition QJ namely, that zero- 
correlation approximations imply an integral under the condition that b\ re- 
mains the same with the change of oi . Lemma |T| is also especially useful for 
defining an integral distinguisher that is based on zero-correlation properties: 
Given a number of zero-correlation linear approximations (on the right-hand 
side of ©), one checks if the corresponding partial function of the cipher is bal- 
anced (the left-hand side of (QJ ) . This can be done for each partial input fixation 
A separately. 

The following direct corollary of Lemma Q] is even more telling and is the key 
in exhibiting the close link between zero-correlation distinguishers and integral 
distinguishers: 

Lemma 2. The following holds for any b±: 

2 s £ f A (0, h) 2 = Hi((a u 0), h) 2 
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This lemma proves both directions of Proposition d including the fact that an 
integral implies zero-correlation distinguishers. In the sequel, we provide a more 
detailed description of the link and an example. 

3.2 From Zero-Correlation to Integral Distinguishers 
(Conditional)... 

First, assume that H : Fj A FJ is a (part of) cipher vulnerable to zero- 
correlation attacks. More precisely, assume that for any a = (ai,0) and any 
b = (fei , 0) 0 the relation ( b , H (a;)) + (a, x) has correlation zero. We’d like 

to highlight two points here. The restriction to masks of the form a = (ai , 0) 
and b = (&i,0), that is, to the masks where the last bits are fixed to zero, is 
solely for the simplicity of notations. However, the zero-correlation distinguish- 
ers considered here are of a special case: We assume not only that the used input 
and output masks form subspaces but also that this space of input and output 
masks is actually the direct product of the space of input masks and the space of 
output masks. Informally, the masks must not be coupled as they are for exam- 
ple in the attack on CAST-256 described in Section d We call such uncoupled 
input-output masks, for our equivalence result applies, detached masks. 

Under those conditions, it follows from Lemma El above that T\(0, fq) equals 
zero for all b\ 0 and all A. This yields that, for any A the function T\ mapping 
s bits to t bits is balanced. In other words, H exhibits the following integral 
distinguisher: Fixing the first s bits of H arbitrarily and encrypting all remaining 
2 r possible plaintext, each possible t bits string occurs equally often in the first 
t bits of the output of H. In the particular case of s = t, the function T\ is a 
permutation and, thus, each possible t-bit string should occur exactly once. 

3.3 ...And Back Again (Unconditional) 

On the other hand, let us consider the case of a cipher that is vulnerable to an in- 
tegral distinguisher in the following sense. Assume that, by fixing some (without 
loss of generality, the first s) bits in the input and encrypting all possible remain- 
ing plaintexts, one can identify a subset of t bits (again without loss of generality, 
the first t bits), each possible t-bit string occurs equally often. Then H is also 
vulnerable to a zero-correlation attack. More precisely, H((a\, 0), (b \ , 0)) = 0 for 
all ai € F.) and b\ gF^. Again, this follows directly from Lemma El In fact, an 
integral unconditionally implies zero-correlation. 


3.4 Discussion of the Link 

As pointed out, this relation is intriguing as zero-correlation distinguishers and 
integral distinguishers are constructed quite differently. Moreover, not only the 
ways the distinguishers are constructed are different but also the ways the re- 
sulting attacks work seem inherently different. 

The first difference is that zero-correlation attacks are usually known plain- 
text attacks (or using known distinct plaintexts, while integral attacks are usually 
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chosen plaintext attacks. Moreover, for zero-correlation attacks, appending rounds 
before the distinguisher normally does not increase the data complexity. On the 
other hand, appending rounds before an integral distinguisher often results in an 
increased data complexity as, for each (partial) key guess, one has to ensure that 
some values are fixed according to the distinguisher. Finally, integral distinguishers 
have the advantage that it is often possible to extend the distinguisher by relaxing 
the balanced property to a zero-sum property (or equivalently to the fact that a 
certain subfunction does not have maximal algebraic degree). For zero correlation 
attacks, such an extension is not known so far. 

Thus, besides being interesting from a theoretical perspective, the above men- 
tioned link clearly calls for further work on combining the specific advantages 
offered by both attacks. 

Before discussing an application of this relation to mount an integral attack on 
a variant of Skipjack, we’d like to illustrate the above with AES as an example. 


3.5 Example with AES 

FigEI depicts the well-known 3-round integral distinguisher for AES. Starting 
with one active byte and fixing all other bytes results in all bytes being active 
after ShiftRows in the third round. In terms of zero-correlation distinguisher, 
the above discussion implies that for any non-zero input mask with (at least) 
one zero byte and any non-zero output mask which is zero in all but one byte 
the corresponding linear approximation is unbiased. 






Fig. 2. The integral distinguisher on 3 Fig. 3. Zero correlation distinguisher on 
rounds of AES. The X denotes an active 4 rounds of AES. The N denotes a non- 
byte. zero byte in the mask. 


Reciprocally, Fig[3 shows the 4 round zero-correlation distinguisher from [Zj. 
For any non-zero mask which is zero in all-but-one bytes and any output mask 
with the same condition, the corresponding linear approximation is unbiased. 
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Now, again using the above discussion, this implies the following integral dis- 
tinguisher on 4 rounds of AES. Fix any byte in the plaintext and encrypt all 
remaining 2 120 possible plaintexts. Check if the output restricted to any byte 
results is a balanced function, that is, of each out of the possible 256 values is 
obtained exactly 2 112 times. Note that this distinguisher was implicitly used for 
example in m- 

4 Integral Zero-Correlation for a Skipjack Variant 

4.1 Skipjack-BAB ABABA vs the Original Skipjack- A ABB A ABB 

Skipjack m is the only block cipher known to be designed by NSA. Skipjack is 
a 64-bit block cipher with an 80-bit key. It is an unbalanced Feistel network with 
32 rounds of two types, called Rule A and Rule B. Each round is described in the 
form of a linear feedback shift register with additional non-linear keyed G per- 
mutation. Rule B is basically the inverse of Rule A with minor positioning differ- 
ences. Skipjack applies eight rounds of Rule A, followed by eight rounds of Rule 
B, followed by another eight rounds of Rule A, followed by another eight rounds 
of Rule B. We refer to this original Skipjack algorithm as Skipjack- AABBAABB 
- A denoting four rounds of Rule A and B standing for four rounds of Rule B. 
The best known cryptanalytic result for the original Skipjack-AABBAABB is 
the impossible differential cryptanalysis for 31 rounds given by Biham et.al. 0 
based on a 24-round impossible differential. 

In Skipjack-BABABABA, four rounds of Rule B are applied first, followed by 
four rounds of Rule A, followed by another four rounds of Rule B, followed by 
another four rounds of Rule A. The rest of the cipher is exactly as in Skipjack- 
AABBAABB, amounting to 32 rounds in total. See the Fig03 Skipjack variants 
involving the change of order of Rules A and B were studied in P3EE31 . Though 
it was suggested that putting Rule B before Rule A might facilitate truncated 
differentials as a matter of principle, no attacks have been reported on Skipjack- 
BABABABA. 

For Skipjack-BABABABA, the longest impossible differential we can find is 
over 21 rounds and covers less rounds than the 24-round impossible differential 
for the original Skipjack. However, in the following, we derive 30-round zero- 
correlation linear approximations for Skipjack-BABABABA. 

4.2 Zero- Correlation Linear Approximations for 30 Rounds of 
Skipjack-BABABABA 

Let the input masks for the first round be (Li, Li , 0 , 0) and the output mask for 
the last round be (L 2 , L 2 , 0, 0) for any non-zero Li and L 2 . FigEJdepicts the evo- 
lution of both masks from the top and from the bottom towards the middle of the 
cipher. In the figure, Mj denotes an undetermined non-zero mask and /?,; denotes 
an undetermined mask (zero or non-zero). From the input mask (Li, L\, 0, 0) at 
the first round, the output mask of the 19-th round is (M 4 , R 2 , Ri, M 5 ). From 
the output mask (L 2 ,L 2 ,0, 0) at the 30-th round, the input mask of the 20-th 
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(c) Integral zero- (b) 30-round zero-correlation linear approximations for 

correlation attack on Skipjack-BAB ABABA (Mi denotes an undetermined non- 

31-round Skipjack- ze ro mask and Ri denotes an undetermined mask - zero or 

BABABABA (values non-zero) 

in brackets are masks, 

plain values are actual 

data processed) 


Fig. 4. Integral zero-correlation cryptanalysis of 31-round Skipjack-BABABABA 
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round is (M7, 0, 0, 0). Here we conclude that (M4, P 2 , Ri,M 5 ) 7^ (M7, 0, 0, 0) as 
equality would imply that M 5 = 0 contradicting that M 5 7 ^ 0. Therefore, the 
linear hull of the 30-round linear approximation (Li,Li,0, 0) —> (£ 2 , L 2 , 0, 0) 
does not contain linear trails of non-zero correlation contribution and, thus, has 
correlation zero. 

Property 1. In Skipjack-BABABABA, each linear approximation of the form 
(Li,Li,0, 0) —> (£ 2 ,^ 2 , 0,0) for non-zero L\ and L 2 over the 30 rounds 
B 3 ABABABA 3 has zero correlation. Here B 3 ABABABA 3 means that the 30 
rounds start from three consecutive rounds of Rule B, followed by ABABAB 
and by three consecutive rounds of Rule A. 

4.3 Zero- Correlation Integral Attack on 31-Round 
Skipjack-BABABABA 

Here we describe how to use Proposition |T| to attack 31 rounds of Skipjack- 
BABABABA using an integral distinguisher. Combining Proposition |T| with 
Property n leads to the following distinguisher. 

Corollary 1. With the notation of Fig^ for the 30-round, Skipjack- 
BABABABA 3 , encrypting all 2 48 plaintexts of the form (P 1 IF 2 I-P 3 I-P 1 ) each 
of the 2 16 possible values of v% ® V 3 occurs exactly 2 32 times. 

With the notation of FigEjJ this distinguisher can now be used directly to mount 
a key-recovery attack on the 31 rounds of Skipjack-B 3 ABABABA as follows. 

- Initialize 2 32 counters VjfCycy to zero. 

- Encrypt each of all 2 48 plaintexts of the form (Pi | P 2 1 P 3 1 Pi ) , and increase 
Pi[C 2 |C 3 ] by one. 

- For each guess of the 2 32 possible values for k: 

• Initialize 2 16 counters 16 M to zero. 

• Decrypt all 2 16 values of C 2 to get u 2 |u 3 and increase 16 [u 2 ® t’s] by 

VilC 2 \Cs}. 

• If one of the counters 16 [v] 7 ^ 2 16 , discard k as a, wrong key-guess. 

With high probability only the correct guess for k will not be discarded. As the 
key size for Skipjack is 80 bits, the remaining key bits can be brute-forced with 
a complexity of 2 48 . The time complexity of this attack is roughly 2 49 Skipjack 
encryptions and we have to store roughly 2 32 counters. The data complexity is 
2 48 chosen plaintexts. Thus, this attack has practical complexities. 

5 Zero-Correlation and Multidimensional Linear 
Distinguishers 

5.1 Multidimensional Linear Setting 

Given m linear approximations 


(ui,x) +(wi,y), 


i = 1, . . . ,m, 
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where x e F£ is plaintext and y E Fl, is some part of data in the encryption pro- 
cess, one obtains an m-tuple of bits by evaluating those for a plaintext-ciphertext 
pair. Instead of considering each such bit and its distribution independently as 
x varies, multidimensional linear cryptanalysis focuses on the analysis of the 
distribution of the m-tuples 

z={z 1 ,...,z m ), z i = {u i ,x) + (w i ,y). 

Then we have the following relationship between the probability distribution of 
z and the correlations c 7 of all linear approximations 7 E F™ : 

Pr[z] = 2~ m £(-l )<^>c 7 . (2) 

7GFJ 1 

Note that this is actually the key in proving that for a balanced function all 
component functions have zero-correlation. 

We denote by U and W the m X n and m xt matrices with rows tq and w t . 
respectively. Then we have z = Ux + Wy and can write 

(7, z) = (7, Ux + Wy) = (U T 7, x) + {W T 7, y), (3) 

where U T 7 and W T 7 are linear combinations of the linear masks u t and v). t , 
i = 0, . . . , m, respectively. 

5.2 How to Make Zero-Correlation Multidimensional 

Now we are ready to formulate the zero-correlation distinguishing property as a 
special case of the multidimensional distinguishing property. 

Zero-correlation distinguisher assumes that the correlations of all linear ap- 
proximations (ui, x)+(wi, y),i - 1, . . . , m, and their nonzero linear combinations 
are equal to zero. (Note that this means, in particular, that these m linear ap- 
proximations are statistically independent.) By ( 0 ), it follows that c 7 = 0, for 
all 7 7^ 0. When substituting this information in the formula of Pr[z] in 0, we 
obtain that z has a uniform distribution in F™. 

Let the adversary be given N distinct plaintexts for an n-bit block cipher and 
m linear approximations such that all their nonzero linear combinations have 
correlation zero. Then he can construct, as shown above, a function from FJ 
to F™ whose outputs z computed for all plaintexts are uniformly distributed 
m-tuples of bits in F™. 

Such a completely uniform distribution is very unlikely to have been obtained 
from selecting the values at random in F™, even if the probability of each value 
is equal, spanning a linear space of t = 2 m zero-correlation approximations of 
dimension m. But as we will see, it is possible to distinguish the non-random 
behavior of the cipher data already with much less data than the full codebook. 
The distribution of the cipher data follows multivariate hypergeometric distribu- 
tion, while the data drawn at random from a uniform distribution on F™ follows 
multinomial distribution. These distributions have essentially different param- 
eters for large sample sizes N and can be distinguished from each other. The 
distinguisher can be obtained as follows. 
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5.3 Multidimensional Distinguisher for Correlation Zero 

For each of the 2 m data values z G F™, the attacker initializes a counter V[z], z = 
0, 1, 2, , 2 m — 1, to zero value. Then, for each distinct plaintext, the attacker 
computes the corresponding data value in F™ (by evaluating the m basis linear 
approximations) and increments the counter V[z] of this data value by one. Then 
the attacker computes the statistic T for this distribution as 


T V 1 (ns] -arc-™ 

^ N2~ m (l — 2 - ’ 

i = 0 v 


(4) 


The statistic T will have two distinct distributions for the cipher exhibiting 
zero-correlation and a randomly drawn permutation which is our wrong-key 
hypothesis assumption: 


Proposition 2. For sufficiently large sample size N and number t of zero- 
correlation linear approximations given for the cipher, the statistic T follows 
a x 2 -distribution for the cipher approximately with mean and variance 


Ho = Exp (T cipher ) = (Z - 1) and a 2 0 = Var (T cipher ) = 2(1 - 1) ) 

and for a randomly drawn permutation with mean and variance 

Hi = Exp {T random ) = 1-1 and of = Var (T random ) = 2(1 - 1). 

The proof of this proposition is available in the full version of this paper 0 . 


5.4 Distinguishing Complexity 

Applying the standard normal approximation of y 2 to the two different distri- 
butions of the statistic T in Proposition |21 one can compute data complexities 
N of the distinguisher, given error probabilities. As a rule of thumb, we can 
conclude that it is sufficient to have N rs 2" +2- ^ distinct plaintexts and their 
corresponding ciphertexts to distinguish the cipher distribution from randomly 
drawn permutation. A more precise distinguishing complexity is given by the 
following statement. 

Corollary 2. Under the assumptions of Proposition^ for type-I error probabil- 
ity ao (the probability to wrongfully discard the cipher), type-II error probability 
ai (the probability to wrongfully accept a randomly chosen permutation as the 
cipher), for an n-bit block cipher exhibiting t zero- correlation linear approxima- 
tions forming an log 2 d-dimensional linear space, the distinguishing complexity 
N can be approximated as 

N _ 2”(gl-q 0 + gl-qj 
Vd/2-qi_ ai ’ 

where qi~ ao a,nd qi~ ai are the respective quantiles of the standard normal dis- 
tribution. 
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Note that this statistical test is based on the decision threshold of r = Mo + 
(7091-0,0 = Mi — £7iQ'i_ ai : If the statistic T < r, the test outputs ’cipher’. Other- 
wise, if the statistic T > r, the test returns ’random’. 

6 Multidimensional Zero-Correlation for 28-Round 
CAST-256 

6.1 Description of CAST-256 

As a first-round AES candidate, CAST-256 is designed based on CAST-128. The 
block size is 128 bits, and the key size can be 128, 192 or 256 bits. CAST-256 
has 48 rounds for all key sizes. The design of CAST-256 is a generalized Feistel 
network with 4 lines as illustrated in FigEaJ 

We denote the 128-bit block of CAST-256 as /3 = (A\B\C\D), where A, B, C 
and D are 32 bits each. Two types of round function, the forward quad-round 
Q(-) and the reverse quad-round Q(-) are used in CAST-256. 

The forward quad-round 8 := Qi(fi) is defined as consecutive application of 
4 rounds as follows: 

C = C®F 1 {D, K Rl « , K Ml ' w ), B = B © F 2 (C, K R2 W , K M2 {i) ), 

A = A@F 3 {B,K R3 W ,K m 3 0 ) , D = D © Fi (A, K Ri 0 , K Mi « ) . 

Similarly, the reverse quad-round 8 ~ Qi{8) is defined as: 

D = D © F 1 ( A, K R4 W , K m J {i) ) , A = A ® F 3 (B, Kr 3 « , K Ms (i) ) , 

B = B © F 2 (C, Kr 2 « , K M2 « ) , C = C © F x {D , K Rl 0 , K Ml W ) , 

where K R - l ' > = {K Rl , K R2 ^ , K Rs ^ , K Ri ^ } is the set of rotation keys for 
the i-th quad-round, and Km ^ = {Km^ 1 \ Km 2 ^\ Km 3 ^\ KmJ^} is the set of 
masking keys for the i-th quad-round. 

The encryption procedure for CAST-256 consists of 6 forward quad-rounds 
followed by 6 reverse quad-rounds, counting 48 rounds in total. Decryption is 
identical to encryption except that the sets of quad-round keys K R 0 and K M 0 
are applied in the reverse order. The keys are obtained from an up to 256-bit 
master key by encrypting it with a CAST-256-type cipher (acting on on eight 
32-bit words) with known constants as subkeys. 

The functions Fi, F 2 and F :i are exactly those of CAST- 128. They use four 
8x32-bit S-boxes based on bent functions, modular addition, modular subtrac- 
tion, XOR and key-dependent rotation. See Fig. 

6.2 24-Round Zero-Correlation Linear Approximations for 
CAST-256 

Property 2. For 24-round CAST-256 (3 forward quad-rounds followed by 3 re- 
verse quad-rounds, or rounds 13-36), if the input mask is (0|0|0|Li) and the 
output mask is ( 0 | 0 | 0 |fi 2 )) the correlation of the linear approximation for the 
24-round CAST-256 is zero, where Li ^ L 2 , Li ^ 0, and L 2 ^ 0. 
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The proof of this property is available in the full version of this paper [£J . 

As compared to this 24-round property, the longest impossible differential for 
CAST-256 we are aware of covers 18 rounds m The work j2j claims unspecified 
20-round impossible differentials. Thus, the zero-correlation property for CAST- 
256 is at least 4 rounds longer than the one of impossible differential. 
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(a) Forward quad-round of CAST-256 
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(c) Multidimensional zero- 
correlation cryptanalysis 
of 28-round CAST-256: 
4 forward quad-rounds 
and 3 reverse quad-rounds 
(values in brackets denote 
masks, plain values are 
actual data processed) 


(b) 24-round zero-correlation linear 
approximations of CAST-256 (3 for- 
ward and 3 reverse quad-rounds) 


Fig. 5. Multidimensional zero-correlation cryptanalysis of 28-round CAST-256 


6.3 Key Recovery for 28-Round CAST-256 

We use the 24-round zero-correlation linear approximations of Property |2| to 
attack 28 rounds of CAST-256. Fig. E3 illustrates the recovery of the subkey 
values from the first round to the fourth round. The attack works as follows. 
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For each possible 148-bit subkey value n = K r < ' 1, \Km < ' 1> '■ 

1. Allocate a 64-bit global counter V[z] for each of 2 64 possible values of the 64- 
bit vector z and set it to 0. V[z] will contain the number of times the vector 
value z occurs for the current key guess k. The vector 2 is the concatenation 
of evaluations of 64 basis zero-correlation masks. 

2. For each of N distinct plaintext-ciphertext pairs: 

(a) Partially encrypt 4 rounds and get 64-bit value for X\C 4 . 

(b) Evaluate all 64 basis zero-correlation masks on X\Ci and put the eval- 
uations to the vector 2 . 

(c) Increment V[z]. 

3. Compute the x 2 statistic T = A'2 64 ^ 2 =0 " 1 ^-^1 — • 

4. If T < t, then the subkey guess k is a possible subkey candidate and all 
master keys it is compatible with are tested exhaustively against a maximum 
of 3 plaintext-ciphertext pairs. 


Table 2. Summary of attacks on CAST-256: KP = Known Plaintexts, CP = Chosen 
Plaintexts 


Rounds 

Key size 

Attack 

Data 

Time 

Memory 

(bytes) 

Ratio of 
weak keys 

Ref. 

16 

128, 192, 256 

boomerang 

2 4aj CP 

- 

- 

1 

m 

24 

192 or 256 

linear 

2 1241 KP 

2 156.52 

— 

1 

E2 

36 

256 

differential 

2 123 CP 

2 182 

- 

2-ss 

m 

28 

256 

multidim. ZC 

2 ms.8 K P 

2 240a 

2 0S 

1 

Here 


In this attack, using Corollary El we set the type-I error probability (the 
probability to miss the right key) to oto = 2 -2,7 and the type-II error probability 
(the probability to accept a wrong key) to oq = 2 -14 . Thus, we get qi- ao = 1 
and <?i-ai = 3.84. Here, r = <J\ ■ q ai + Hi ~ 2 64 . 

Corollary El suggests that the data complexity is N = 2 98 - 8 distinct plaintext- 
ciphertexts with those parameters. The success probability of the entire attack 
is 1 — ao « 0.846. 

The time complexity is 2 246 8 times of one-round encryption and 2 246 ' 8 mem- 
ory accesses to a memory of size 2 64 . Under the assumption that one memory 
access with size 2 64 is equivalent to one 28-round CAST-256 encryption, the 
total time complexity would be about 2 246 - 9 28-round CAST-256 encryptions. 
Due to cci = 2“ 14 and the total number of recovered bits is 148, the number of 
the remaining subkey values is 2 -14 • 2 148 = 2 134 . Then we exhaustively search 
other 256 — 148 = 108 subkey bits, the time complexity will be 2 134+108 = 2 242 
times of 28-round encryptions. 

The memory requirements are 2 64 128-bit words needed for V[z], or 2 68 bytes. 

In all, the data complexity is about 2 98 ’ 8 known plaintexts, the time complex- 
ity is about 2 246 9 28-round CAST-256 encryptions and the memory require- 
ments are 2 64 blocks. This is the first attack on more than half of the full-round 
AES-candidate CAST-256 without the weak key assumption. See Table El for a 
summary and a comparison of attacks. 
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7 Conclusions 

In this paper, we establish fundamental links between zero-correlation distin- 
guishers on the one hand and integral and multidimensional linear distinguish- 
es on the other. In particular, an integral implies a zero-correlation property 
and zero-correlation distinguishers can be seen as a special case of multidimen- 
sional linear distinguishers. These findings result in two novel distinguishers for 
zero-correlation based on integral and multidimensional linear distinguishers. To 
obtain the latter, we refine the theory of multidimensional linear distinguishers. 
We illustrate these new distinguishers by mounting attacks on a Skipjack variant 
and CAST-256. 
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Abstract. Stream cipher ZUC is the core component in the 3GPP con- 
fidentiality and integrity algorithms 128-EEA3 and 128-EIA3. In this 
paper, we present the details of our differential attacks against ZUC 1.4. 
The vulnerability in ZUC 1.4 is due to the non-injective property in the 
initialization, which results in the difference in the initialization vector 
being cancelled. In the first attack, difference is injected into the first 
byte of the initialization vector, and one out of 2 15 ' 4 random keys re- 
sult in two identical keystreams after testing 2 13 ’ 3 IV pairs for each key. 
The identical keystreams pose a serious threat to the use of ZUC 1.4 in 
applications since it is similar to reusing a key in one-time pad. Once 
identical keystreams are detected, the key can be recovered with aver- 
age complexity 2 99 ' 4 . In the second attack, difference is injected into the 
second byte of the initialization vector, and every key can result in two 
identical keystreams with about 2 54 IVs. Once identical keystreams are 
detected, the key can be recovered with complexity 2 67 . We have pre- 
sented a method to fix the flaw by updating the LFSR in an injective way 
in the initialization. Our suggested method is used in the later versions 
of ZUC. The latest ZUC 1.6 is secure against our attacks. 


1 Introduction 

Comparing to block ciphers, dedicated stream ciphers normally require less com- 
putation for achieving the same security level. Stream ciphers are widely used 
in applications. For example, RC4 [H} is used in SSL and WEP, and A5/1 0 is 
used in GSM (the Global System for Mobile Communications). But the use of 
RC4 in WEP is insecure 0, and A5/1 is very weak 0. ECRYPT (2004-2008) 
has organised the eSTREAM competition, which stimulated the studvon stream 
ciphers, and a number of new stream ciphers were proposed 0-0, 0, 0, 0, Qjj] . 

The 3rd Generation Partnership Project (3GPP) was set up for making 
globally applicable 3G mobile phone system specifications based on the GSM 
specifications. Stream cipher ZUC was designed by the Data Assurance and 
Communication Security Research Center of the Chinese Academy of Sciences. 

* This research is supported by the National Research Foundation Singapore under 
its Competitive Research Programme (CRP Award No. NRF-CRP2-2007-03) and 
Nanyang Technological University NAP startup grant (M4080529.110). 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 262-g77] 2012. 
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It is the core component of the 3GPP Confidentiality and Integrity Algorithms 
128-EEA3 & 128-EIA3 which were proposed for inclusion in the “4G” mobile 
standard LTE (Long Term Evolution). In July 2010, the ZUC 1.4 0 was made 
public for evaluation. We developed two key recovery attacks against ZUC 1.4 
|T3| . and our attacks directly led to the tweak of ZUC 1.4 into ZUC 1.5 Qi„ 
Jan 2011. (Note that it was reported independently in 0 that the non-injective 
initialization of ZUC 1.4 may result in identical keystreams.) The latest version, 
ZUC 1.6 Q , was released in June 2011 (ZUC 1.6 and ZUC 1.5 have almost the 
same specifications). 

In this paper, we present the details of our differential attacks against ZUC 
1.4. Our attacks against ZUC is similar to the differential attacks against Py, 
Py6 and Pypy 0, in which different IVs result in identical keystreams. In the 
first attack against ZUC 1.4, the difference is at the first byte of the IV, and 
one in 2 15 ' 4 keys results in identical keystreams after testing 2 13 ' 3 IV pairs for 
each key. Once identical keystreams are detected, the key can be recovered with 
complexity 2 99 ' 4 . In the second attack against ZUC 1.4, the difference is at the 
second byte of the IV, and identical keystreams can be obtained after testing 
2 54 IVs. The key can be recovered with complexity 2 67 . 

This paper is organized as follows. The notations and the description of ZUC 
1.4 are give in Sect. 2. The overview of the attack is is given in Sect. 3. In Section 
4 and 5, we present the key recovery attack with difference at the first byte and 
the second byte of IV, respectively. We suggest the tweak to fix the flaw in Sect. 
6. Section 7 concludes the paper. 


2 Preliminaries 


2.1 The Notations 


In this paper, we follow the notations used in the ZUC specifications 0 . 


ab 

a\\b 

a«<k 

a»>k 

a»k 

a H 

aL 

(ai, a2, . . . , a n )—>(bi,b2, ■ ■ ■ , b n ) 


The addition of two integers 

The bit-wise exclusive-or operation of integers 

The modulo 2 32 addition 

The product of integers a and b 

The concatenation of a and b 

The k - bit cyclic shift of a to the left 

The /c-bit cyclic shift of a to the right 

The k-bit right shift of integer a 

The most significant 16 bits of integer a 

The least significant 16 bits of integer a 

It assigns the values of a* to bi in parallel 
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The sequence of n bits 0 
The sequence of n bits 1 
The bitwise complement of y 


An integer a can be written in different formats. For example, 

a = 25 decimal representation 

= 0x19 hexadecimal representation 

= 0001 1001 2 binary representation 

We number the least significant bit with 1 and use A[i] to denote the ith bit of 
a A. And use B[i..j ] to denote the bit i to bit j of B. 

2.2 The General Structure of ZUC 1.4 

ZUC is a word-oriented stream cipher with 128-bit secret key and a 128-bit initial 
vector. It consists of three main components: the linear feedback shift register 
(LFSR), the bit-reorganization (BR) and a nonlinear function F. The general 
structure of the algorithm is illustrated in Fig. QJ 



Linear Feedback Shift Register (LFSR). It consists of sixteen 31-bit regis- 
ters SO) si; • • •> sis, and each register is an integer in the range {1,2,..., 2 31 — 1}. 
During the keystream generation stage, the LFSR is updated as follows: 
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LFSRUpdate(): 

1. S16 = (2 15 Si5 + 2 17 Si3 + 2 21 Sio + 2 20 s 4 + (1 + 2 8 )s 0 )mod(2 31 - 1); 

2. If Si6 = 0 then set Si6 = 2 31 — 1; 

3. (si, S2, • • • , Sl5) Sl6) — t (SO) Sl, . . • , S14, S15). 

Bit-Reorganization Function. It extracts 128 bits from the state of the LFSR 
and forms four 32-bit words Xo, X\ X? and X 3 as follows: 

Bitreorganization() : 

1. Xq = Sl5ij||si4i,; 

2. X\ = snz,||s9jj; 

3. X 2 = S7i||S5i?; 

4. X 3 = S2i||soir; 

Nonlinear Function F. It contains two 32- bit memory words i?i and R 2 - The 
description of F is given below. In function F, S is the Sbox layer and Li and 
L 2 are linear transformations as defined in jlj. The output of function F is a 
32-bit word W. The keystream word Z is given as Z = W ® X 3 . 

F(X o, X lt X 2 ): 

1. W = (X 0 ®Ri)mR 2 ; 

2. Wi = R 1 mx 1 - 

3. W 2 =R 2 QX 2 ; 

4. Ri = SihiW^WW^))', 

5. R 2 = S(L 2 (W 2L \\W 1H )y, 


2.3 The Initialization of ZUC 1.4 

The initialization of ZUC 1.4 consists of two steps: loading the key and IV into 
the register, and running the cipher for 32 steps with the keystream word being 
used to update the state. 

Key and IV Loading. Denote the 16 key bytes as hi (0 < i < 15), the 
16 IV bytes as iVi (0 < i < 15). We load the key and IV into the register 
as: Si = (ki\\di\\ivi). The values of the constants di are given in 0 . The two 
memory words Ri and R 2 in function F are set as 0. 

Running the Cipher for 32 Steps. At the initialization stage, the keystream 
word Z is used to update the LFSR as follows: 

LFSRWithlnitialisationMode(u) : 

1. v = (2 15 sis + 2 17 si3 + 2 21 sio + 2 20 s 4 + (1 + 2 8 )so)mod(2 31 — 1); 

2. If v = 0 then set v = 2 31 - 1; 

3. si6 = v ffi u ; 

4. If si 6 = 0 then set Si6 = 2 31 — 1; 

5. (si, S2, • • • , si5> sm) — > ( so t «!)•••) si4, sis)- 
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The cipher runs for 32 steps at the initialization stage as follows: 
InitializationStage( ) : 
for * = 0 to 31 { 

1. Bitreorganization(); 

2. Z = F(X 0 ,X 1 ,X 2 )®X 3 - 

3. LFSRWithInitialisationMode(Z » 1) . 

} 

3 Overview of the Attacks 

We notice that the LFSR in ZUC is defined over GF( 2 31 — 1), with the element 
0 being replaced with 2 31 — 1. To the best of our knowledge, it is the first time 
that GF(2 31 — 1) is used in the design of stream cipher. In the initialization of 
ZUC 1.4, we notice that XOR is involved in the update of LFSR (si 6 = v © u) . 
When XOR is applied to the elements in GF( 2 31 — 1) , we obtain the following 
undesirable property: 

Property 1 . Suppose that a and a' are two elements in GF( 2 31 — 1) , a ^ a', 
and a = a! . If b = a or b = a, then a® b mod (2 31 — 1) = a' ® b mod (2 31 — 1) = 0. 

The above property shows that the difference between a and a' can get eliminated 
with an XOR operation! In the rest of this paper, we exploit this property to 
attack ZUC 1.4 by eliminating the difference in the state. 

In our attacks, we try to eliminate the difference in the state without the 
difference in the state being injected into the nonlinear function F. The reason 
is that if a difference is injected into F, then Sboxes would be involved, and the 
difference would remain in F until additional difference being injected into F, 
thus the probability that the difference in the state being eliminated would get 
significantly reduced. 

We now investigate what are the IV differences that would result in the dif- 
ference in the state being eliminated with high probability. The IV differences 
are classified into the following three types: 

Type 1. Aivi ^ 0 for at least one value of i (7 < i < 15). 

After loading this type of I Vs into LFSR, the difference would appear at the 
least significant byte of at least one of the LFSR elements sy, sg, ■ ■ ■ , Sis- Note 
that the least significant byte of sy is part of X 2 in the Bit-reorganization func- 
tion since X 2 = S 7 l||s 5 h, and X 2 is an input to function F. Due to the shift 
of LFSR, the difference at the least significant byte of sy, sg, • • • , S 15 would be 
injected into F. Thus we would not use this type of IV difference in our attacks. 

Type 2. Aivi = 0 for 7 < i <15, Aivi ^ 0 for at least one value of i (2 < i < 6). 
After loading this type of IVs into LFSR, the difference would appear at the least 
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significant byte of at least one of the LFSR elements s 2 , s 3, • • • , S6 . Note that 
the least significant byte of S2 is part of X3 in the Bit-reorganization function 
since X3 = S2 l\\soh : X a is XORed with the output of F to generate keystream 
word Z , and Z is used to update the LFSR. Two steps later, the difference in iv 2 
would appear in the feedback function to update LFSR. It means that if there is 
difference in iv 2 , the difference in s 2 would be used to update the LFSR twice, 
and the probability that the difference would be eliminated is very small. Due to 
the shift of LFSR, the difference at s 2 , S3, • • • , S7 would be eliminated with very 
small probability. Thus we did not use this type of IV difference in our attacks. 

Type 3. Aivi = 0 for 2 < i < 15, Aiv 0 7^ 0 or Aivi 7^ 0. 

The focus of our attacks is on this type of IV differences. In order to increase 
the chance of success, we consider the difference at only one byte of the IV. We 
discuss below how the difference in the state can be eliminated when there is 
difference in so (the analysis for the difference in si is similar). At the first step 
in the initialization, 

so = (fco||4|lH%)s (1) 

v = 2 15 si5 + 2 17 si3 + 2 21 sio + 2 20 s 4 + (1 + 2 s )so mod (2 31 — 1) , (2) 

Si6 = v © u . (3) 

Suppose that the difference is only at iv 0, and iv 0 — iv' 0 = Aiv 0 > 0. From (UJ) 
and (0 we know that 

v — v' = (1 + 2 s )(ivo — iv 0) mod (2 31 — 1) 

= Aivo || Aivo ■ (4) 

If we need to eliminate the difference in si6, from Property 1 and Q, the fol- 

lowing condition should be satisfied: 

v ® v' = I31 (5) 

u = v or u = v' (6) 

According to ©, v and v’ have XOR difference in the left-most 15 bits (i.e.i>[17..31] 
and v' [17.. 31]) , while according to 0) , the subtraction difference of those bits are 0. 
The only possible reason is that the 15 bits, u[17..31], are all affected by the carries 
from the addition of Aiv 0 to v'. After testing all the one-byte differences, we found 
that v must be in one of the following four forms (the values of v and v' can be 
swapped): 



v = 1111111111111111 2 || 

1 y 1 

1 12 1 

1 y 


or 

v = 0111111111111111 2 || 

1 v 1 

1 o 2 1 

1 y 


or 

v = 0000000000000000 2 || 

1 v 1 

1 o 2 1 

1 y 

(7) 

or 

v = 1000000000000000 2 II 
( y is a 7-bit integer.) 

1 y 1 

1 12 1 

1 f 
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There are 510 possible values of v (v = I31 and v = O31 are excluded since one 
of v and v cannot be 0). All the (v, v') pairs and their differences are given in 
Table G] in Appendix El Notice that we ignored the order of v and v' as they are 
exchangeable. We have obtained all the possible values of v and u for generating 
identical keystreams. 

We highlight the following property in the table: the difference between v 
and v' uniquely determines the value of pair (v, v') in the table. As a result, if 
we know the difference of IVs that results in the collision of the state, we can 
determine the value of ( v , v') immediately. 

By eliminating the difference in the state as illustrated above, we developed 
two attacks against ZUC 1.4. The first attack is to exploit the difference at iv 0, 
and the second attack is to exploit the difference at iv 1 . The details are given in 
the following two sections. 


4 Attack ZUC 1.4 with Difference at iv 0 

In this section, we present our first differential attack on the initialization by 
using IV difference at iv 0 and generating identical keystream. The keys that 
generate the same keystream are called weak keys in this attack. We will show 
that a weak key exists with probability 2~ 15A , and a weak key can be detected 
with about 2 13 - 3 chosen IVs. Once a weak key is detected, its effective key size 
is reduced from 128 bits to around 100 bits. 

4.1 The Weak Keys for Aiv 0 

We will show that when there is difference at iv 0, about one in 2 15 4 keys would 
result in identical keystream. For a random key, we will check whether there 
exists a pair of IVs such that 0 , 0 and 0 can be satisfied. 

We start with analyzing how keys and IVs are involved in the expression of u 
and v in the first step of initialization. From the specifications of the initializa- 
tion, we have 

U =Z » 1 = (Vo ® X 3 ) » 1 = ((si5J?||si4L) © (s2i||soff)) » 1 
=((fci5 || iv 2 || ko || iv 14) © 0x6b8f9a89) » 1 

In 0 and 0, there are 5 bytes of key, {fco, fct, fcio, hys, fcis}, and 7 bytes of IV, 
{ivo,iv2,iv4,,ivio,ivi3,ivi4,ivi5} being involved in the computation of u and v. 
The complexity would be very high if we directly try all possible combinations 
of the keys and IVs. However, with analysis on the expressions of u and v. we 
can reduce the search space from 2 96 to around 2 26 - 3 . 
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Solve 0, 0 and 0, we obtain the following four groups of solutions: 


Group 1. 

u = v= lllllllllllllllla || y || 1 2 || y 
fci5 = 0x94 
iv 2 = 0x70 
k 0 = 0x9a ® ( y || 1 2 ) 
iv 14 » 1 = 0x44 ® y 

Group 2. 

u = v = OIIIIIIIIIIIIIII2 || y || 0 2 || y 
fci5 = 0x14 
iv -2 = 0x70 
k 0 = 0x9a® (y || 0 2 ) 
iv 14 » 1 = 0x44 ® y 

Group 3. 

u = v = OOOOOOOOOOOOOOOO2 || § || 0 2 || y 
kis = 0x6b 
iv 2 = 0x8f 
k 0 = 0x9a © ( y || O2) 
iv 14 » 1 = Oxbb © y 

Group 4. 

u = v= IOOOOOOOOOOOOOOO2 || y || 1 2 || V 
kis = Oxeb 
iv 2 = 0x8f 
k 0 = 0x9affi (y || 1 2 ) 
iv 14 » 1 = Oxbb © y 


(9) 


(10) 


( 11 ) 


(12) 


Furthermore, from 0 we compute v as follows (note that the property 2 fe Sj 
mod (2 31 — 1) = Si«<k): 


v — (1 + 2 23 )fco + 2 7 fcis + 2®(fci3 + 2 3 A?4 + 2^fcio) © (1 + 2 3 )iuo 
+ 2 15 (ivi5 + 2 2 ivi3 + 2 5 iv4 + 2 6 ivw) + 0x451bfelb mod (2 31 


1 ) 


(13) 


Let sum\ = k\% + 2 3 &4 + 2 4 fcio, sum^ = iv 15 + 2 2 ivi3 + 2 5 iv4, + 2®iv\o. The value 
of surri\ ranges from 0 to 6375, and the value of sum^ ranges from 0 to 25755. 
We developed Algorithm 1 to search for weak keys. 
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Algorithm 1. Find weak keys for Aivo 
for ( kis , iv 2 ) in each of the 4 groups of solutions Q, (I I ( 111 . (II 1 11 . (1 1 21) do 
for y = 0 to 127 do 

determine iv 14 » 1 and ko 
for sumi = 0 to 6375 do 
for iv 0 = 0 to 255 do 

keySum <— 2 7 fci5 + (2 23 + l)feo + 2 9 sumi mod (2 31 — 1) 
sum 2 <— (u — keySum — (1 + 2 s )ivo — 0x451bfelb)/2 16 mod (2 31 — 1) 
if sum2 is less than 25756 then 
iim=u; v' = u® I 32 ; 

if (v — v') mod (2 31 — 1) is a multiple of 1 + 2 8 then 
Aiv 0 = (v — v ') mod (2 31 — 1)/(1 + 2 8 ); 
iv a = iv 0 — Aiv 0 ; 

Aiv 0 = ( v ' - v) mod (2 31 - 1)/(1 + 2 8 ); 
iv 0 = iv 0 + Aiv 0 ; 

end if 

output u, ko, fcis, sumi, iv 0, iv' 0 , iv 2, iui4»l, sum2 

end if 
end for 
end for 
end for 
end for 


Each output from Algorithm 1 gives the value of (fcis, ko, sumi, iv 0 , iv 0 , 
iv 2 , iv 14 , suri'12) that results in identical keystreams. Running Algorithm 1 , 
we found 9934 = 2 13 ' 28 different outputs. We note that on average, each 
sumi from the output of the algorithm represents 2 24 /6376 = 2 11 36 possible 
choices of (fei, fcio> £ 13 )- Thus there are 2 13 ' 3 x 2 114 = 2 24,7 weak values of 
(ko, ki, k w , ki 3 , kis). Hence, there are 2 24,7 weak keys out of 2 40 possible values 
of the 5 key bytes. The probability that a random key is weak for IV differ- 
ence at iv 0 is 2 -15 - 4 . The complexity of Algorithm 1 is 4x128x6376x256 = 2 26 ,3 . 

Identical Keystreams. We give below a weak key and an IV pair with differ- 
ence at iv 0 that result in identical keystreams. 

key = 87,4,95,13,161,32,199,61,20,147,56,84,126,205,165,148 
IV = 166,166,112,38,192,214,34,211,170,25,18,71,4,135,68,5 
IV' = 116,166,112,38,192,214,34,211,170,25,18,71,4,135,68,5 

For both IV and IV', the identical keystreams are: 0xbfe800d5 0360a22b 
6c4554c8 67f00672 2ce94f3f f94dl2ba Ilc382b3 cbaf4b31. . .. 


4.2 Detecting Weak Keys for Aiv 0 

We have shown above that a random key is weak with probability 2 -15 - 4 . In the 
attack against ZUC, we will first detect a weak key, then recover it. To detect 


Differential Attacks against Stream Cipher ZUC 271 


a weak key, our approach is to use the IV pairs generated from Algorithm 1 to 
test whether identical keystreams are generated. Note that for a particular value 
of sum 2 , we can always find a combination of (iv4, ivio, ivi3, ivi^} that satisfies 
sum2 = iv i 5 + 2 2 iui 3 + 2 5 iu 4 + 2 6 wio- Thus a pair of IVs (ivo,iv 2 ,iv 4 ,ivio, 
iv i3, wi4, iv 15) and (iv' 0 , iv 2, iv 4, iv 10, iv 13, W14, iv 15) can be determined by each 
output of Algorithm 1. Using this result, we developed Algorithm 2 to detect 
weak keys for Aiv q. 


Algorithm 2. Detecting weak keys for Aivo 

1. Choose one of the 2 1328 outputs of Algorithm 1. 

2. Find the pair of IVs determined by this output (if ivj does not appear in the first 
initialization step, set it as some fixed constant). 

3. Use the IV pair to generate two key steams. 

4. If the keystreams are identical, output the IVs and conclude the key is weak. 

5. If all outputs of Algorithm 1 have been checked, and there are no identical 
keystreams, we conclude that the key is not weak. 


In Algorithm 2, we need to test at most 2 13 3 pairs of IVs to determine if a key 
is weak for difference at iv q. 


4.3 Recovering Weak Keys for Aivo 

After detecting a weak key, we proceed to recover the weak key. Once a key is 
detected as weak (as given from Algorithm 2), from the IV pair being used to 
generate identical keystreams, we immediately know the value of ko, k\§ and 
sumi. Note that sumi = (hi 3 + 2 3 Aq + 2 4 Aqo). In the best situations, the sum 
is 0 or 25755, then we can uniquely determine k 4 , Aqo and k\ 3 . In the worst 
situation, there are 2 12 possible choices for Aq, k\o and fc-13, and therefore, we 
need 2 12 tests to determine the correct values for Aq, Aqo and ki 3 . On average, 
for each value of sumi, we need to test 2 114 combinations of (Aq, Aqo, ki 3 ). 

Since there are only five key bytes being recovered in our attack, the remaining 
11 key bytes should be recovered with exhaustive search. Hence, the complexity 
to recover all key bits is 2 88 x 2 11 ' 4 = 2 99 4 . From the analysis above, we also 
know that the best complexity is 2 88 and the worst complexity is 2 100 . 

5 Attack ZUC 1.4 with Difference at iv 1 

In this section, we present the differential attack on ZUC 1.4 for IV difference at 
iv 1 . Different from the attack in Section 4, we need to consider the computation 
of u and v in the second step of the initialization. For this type of IV difference, 
for every key, there are some IV pairs that result in identical keystreams since 
more IV bytes are involved. Once we found such an IV pair, we can recover the 
key with complexity around 2 67 . 
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5.1 Identical Keystreams for Aiv i 

The computation of u and v in the second initialization step involves more key 
and IV bytes. The v in the second initialization step is computed as: 

v = (2 15 si 6 + 2 17 su + 2 21 sh + 2 20 s 5 + (1 + 2 8 )si) mod (2 31 - 1), 

Si6 = ((2 15 S15 + 2 17 si3 + 2 21 sio + 2 20 S 4 + (1 + 2 8 )so) mod (2 31 — 1)) (14) 

© (((&15 II iv 2 || ko || iv 14) ® 0x6b8f9a89) » 1) 

And u is given as: 

u = ((P5a © -Ri) + -Ra) © -X 3 ) >> 1 
A’o = (si6H||10101100 2 ||iui 6 ) 

V 3 = (01011110 2 | | iu 3 | | fci | IOIOOIIOI2) (15) 

Ri = SiL^sanlM) = h(iv 7 ,k 9 ) 

R 2 = S , (T2(s5h||sui,)) ** f2(iviu k$) 

where fi and / 2 are some deterministic non-linear functions. 

There are 10 IV bytes involved in the expression of v, i.e. (iv 0, iv 1, iv$, iv 4 , iv$ , 
iv 10, iv ii, iv i 3 , W 14 , iv 15) and 8 IV bytes involved in the expression of u, i.e. (iv 0, 
W 3 , iv 4, iv 7, ivio, ivn , iv i 3 , W 15 ). In total, there are 12 IV bytes being involved 
in the computation of u and v, and every bit of u and v can be affected by IV. 
We conjecture that for every key, the conditions Q and (0) can be satisfied, and 
identical keystreams can be generated. To verify it, we tested 1000 random keys. 
Our experimental results show that there is always an IV pair for each key that 
results in identical keystreams. 

In the attack, a random key and a random iv pair with difference at 
iv 1, the probability that v and u satisfy the conditions (0) and 0) is 
2 -31 x 2 -31 x 2 = 2 -61 . Choosing 2 s iv s with difference at iv 1, we have around 
2 15 pairs. The identical keystream pair appears with probability 2 -61+15 = 2 -46 
with 2 s IVs. We thus need about 2 46 x 2 8 = 2 54 IVs to obtain identical 
keystreams. 

Identical Keystreams. We give below a key and an IV pair with difference at 
iv 1 that result in identical keystreams. The algorithm being used to find the IV 
pair is given in Appendix 0 The algorithm is a bit complicated since a number 
of optimization tricks are involved. The explanation of the optimization details 
is omitted here since our focus is to develop a key recovery attack. 

key= 123,149,193,87,42,150,117,4,209,101,85,57,46,117,49,243 
IV = 92,80,241,10,0,217,47,224,48,203,0,45,204,0,0,17 
IV' = 92,182,241,10,0,217,47,224,48,203,0,45,204,0,0,17 

The identical keystreams are: 0xf09ccl7d 41fl2d3f 453ac0c3 cadcef9f f98fb964 
ca6e576e b48b813 6c43da22 
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5.2 Key Recovery for Aiv i 

After identical keystreams are generated from an IV pair with difference at iv\ , 
we proceed to recover the secret key. From Table [Om Appendix 0 we know the 
value of (v,v') since we know the difference at iv\ of the chosen IV pair, and we 
also know the value of u since u = v or u = v'. In the following, we illustrate a 
key recovery attack after identical keystreams have been detected. 

1. In the expression of u in (1THI) . (ki, k$, kg, sigh) is involved. Note that there 
are only two possible values of the 31-bit u. We try all the possible values 
of (fci, k§, kg, sigh), then there would be 2 8x3+16 x 2 -31 X 2 = 2 10 possible 
values of (ki, k$, kg, sigh) that generate the two possible values of u. The 
complexity of this step is 2 40 . 

2. Next we use the expression of .Si6 in (TO . For each of the 2 10 possible values 
of (ki, k$, kg, sigh), we try all the possible values of ( kg , ki, kio, ki3, &15) 
and check whether the values of sigh is computed correctly or not. There 
would be 2 8x5 x 2“ 16 = 2 24 possible values of (ko, ki, kio, fci3, ku) left. 
Considering that there are 2 10 possible values of (ki, k 5 , kg, sigh), about 
2 10 x 2 24 = 2 34 possible values of (ko, ki, k 4, ko, kg, kio, ki3, fci5,si6if) 
remain. The complexity of this step is 2 8 x 5 X 2 10 = 2 50 . 

3. Then we use the expression of v in G3D- For each of the 2 34 possible values 
of (k 0 , ki, ki, ko, kg, kio, ki3, ku,siGH), we try all the possible values of 
(fcn, ku) and check whether the value of v is correct or not. A random 
value of (ku, ku) would pass the test with probability 2 8x2 x 2 -31 = 2 -15 
Considering that there are 2 34 possible values of (ko, ki , ki, k$, kg, k\o, kio, 
k\G,s\GH), about 2 34 x 2 -15 = 2 19 possible values of (ko, ki, ki, k 5 , kg, k\o, 
ku, ki3, ku, kio) remain. The complexity of this step is 2 8x2 x 2 34 = 2 50 . 

4 . For each of the 2 19 possible values of (kg, ki, ki, k$, kg, kio, ku, k\3, ku, 
ku), we recover the remaining 6 key bytes (k2,k3,ke,kY,k8,ki2) by exhaustive 
search. The complexity of this step is 2 19 X 2 8x6 = 2 67 . 

The overall computational complexity to recover a key is 2 40 + 2 50 + 2 50 + 2 67 w 
2 67 . And we need about 2 54 IVs in the attack. Note that the complexity in the 
first, second and third steps can be significantly reduced with optimization since 
we are dealing with simple functions. For example, meet-in-the-middle attack 
can be used in the first step, and the sum of a few key bytes can be considered 
in the second and third steps. However, the complexity of those three steps has 
little effect on the overall complexity of the attack, so we do not present the 
details of the optimization here. 

6 Improving ZUC 1.4 

From the analysis in Sect. 3, the weakness of the initialization comes from the 
non-injective update of the LFSR. To fix the flaw, we proposed the tweak in 
the rump session of Asiacrypt 2010. Instead of using the XOR operation, it is 
better to use addition modulo operation over GF( 2 31 — 1). More specifically, 
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the operation si6 = v © u is changed to Si6 = v + u mod (2 31 — 1). With this 
tweak, the difference in v would always result in the difference in si6 if there 
is no difference in u, and the attack against ZUC 1.4 can no longer be applied. 
In the later versions ZUC 1.5 and 1.6 (ZUC 1.5 and 1.6 have almost the same 
specifications), the computation of sjj is modified using our suggested method. 

7 Conclusion 

In this paper, we developed two chosen IV attacks against the initialization of 
ZUC 1.4. In our attacks, identical keystreams are generated from different IVs, 
then key recovery attacks are applied. Our attacks are independent of the number 
of steps in initialization. The lesson from this paper is that when non-injective 
functions are used in cipher design, we should pay special attention to ensure 
that the difference cannot be eliminated with high probability. 
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B Generating Identical Keystreams for Aiv i 

Here we describe more details of an algorithm that is used to generate identical 
keystreams for the IV difference at iv \ : 

1. Initialize iv$,iv\, . . . ,iv 15 with 0. Set W 13 = 64. 

2. Denote (iv 4 + 81 V 13 + l&ivw) as sum\ and guess sum\ with 1 of the 6376 
possible values. 

3. Guess W 2 W, 2], and compute v, until the condition u[1..7] — (v » 8)[1..7] < 1 
is satisfied. If not possible, go to (2) . 

4. Guess iv 7 and iv 11, and compute u, until u[24..31] = Oxff is satisfied. We 
store the intermediate state sm- If not possible, go to (3). 

5. Guess iv 15 and re-compute u, until u[ 1..7] = u[9..15] and u[8] = 0 are satis- 
fied. If not possible, go to (4). 

6. Now we compare the current Si6 with stored Si6 to capture the change. By 
properly changing iv 2 and iui3(this is the reason iv 13 is initialized as 64), we 
can always change the current sie back to the saved value. Hence, w[24..31] 
will remain. 

7. Determine iv 1 as follows: 

- Ifu[8] ^ u[16], then if u[1..16] < u[1..16] is satisfied, ivi = 256+u[1..16] — 
u[1..16] and update v, otherwise, go to (5). 

- If u[8] = u[16], then if u[1..16] >= u[1..16] is satisfied, iv 1 = u[1..16] — 
u[1..16] and update v, otherwise, go to (5). 

8. Guess iv 0, ivs and iv 14, compute v, until u[16..31] = Oxffff. If not possible, 
go to (5). 

9. If (u® w)[l] = 1, let iv 2 = iv 2 ® 2. Choose W 3 properly to ensure tt[16..23] = 
Oxff. Check if we indeed have v = u, then output iv 0, iv 1, • • • , iv 15. Other- 
wise, go to (8). 

In this algorithm, we restrict the forms of v and u to those starting with 0x7f f f 
to reduce the search space. 
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Abstract. We analyze the security of the iterated Even-Mansour cipher 
( a.k.a . key-alternating cipher), a very simple and natural construction of 
a blockcipher in the random permutation model. This construction, first 
considered by Even and Mansour (J. Cryptology, 1997) with a single 
permutation, was recently generalized to use t permutations in the work 
of Bogdanov et al. (EUROCRYPT 2012). They proved that the con- 
struction is secure up to 0(N 2 ^ 3 ) queries (where N is the domain size 
of the permutations), as soon as the number t of rounds is 2 or more. 
This is tight for t = 2, however in the general case the best known attack 
requires l?(iV t// ^ +:1 ^) queries. In this paper, we give asymptotically tight 
security proofs for two types of adversaries: 

1. for non-adaptive chosen-plaintext adversaries, we prove that the con- 
struction achieves an optimal security bound of 0{N t ^ t+v> ) queries; 

2. for adaptive chosen-plaintext and ciphertext adversaries, we prove 
that the construction achieves security up to C>(IV t 'd t + 2 )) queries 
(for t even). This improves previous results for t > 6. 

Our proof crucially relies on the use of a coupling to upper-bound the 
statistical distance of the outputs of the iterated Even-Mansour cipher 
to the uniform distribution. 

Keywords: blockcipher, Even-Mansour cipher, key-alternating cipher, 
random permutation model, coupling, provable security. 


1 Introduction 

The Even-Mansour Cipher. Even and Mansour proposed the following 
“minimal” construction of a blockcipher on message space {0, 1}": given a public 
permutation P on {0,1}" ( e.g . AES-128 with a fixed, publicly known key), 
encrypt x by computing y = k\ ® P(ko ® x), where ho, k\ are two n-bit keys. 
Their work was motivated by the DESX construction proposed by Rivest (1984, 
unpublished) and later formally analyzed by Kilian and Rogaway n~J{ in which 
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Rivest suggested to strengthen DES against exhaustive key search by using two 
independent pre-whitening and post-whitening keys xored respectively to the 
input and the output of DES (thereby augmenting the key size of the resulting 
cipher from 56 to 184 bits). Even and Mansour analyzed their proposal in the 
random permutation model, where P is replaced by an oracle implementing a 
random (invertible) permutation, publicly accessible to all parties including the 
adversary. They showed that an adversary with black-box access to both P and 
the cipher with a random unknown key (as well as their inverse), has only a 
negligible probability to correctly inverse the cipher on an un-queried ciphertext 
of its choice (or to compute the ciphertext corresponding to some un-queried 
plaintext). In fact, the Even-Mansour cipher yields a (strong) pseudorandom 
permutation (in the random permutation model) in the sense that the system 
(P, EM Pi ( fcoifel )), where EMp^^) is the Even-Mansour cipher built from P with 
two uniformly random keys ko and k\ , is indistinguishable from an ideal system 
(P,Q), where Q is an independent random permutation. More precisely, any 
distinguisher has to make J?(2”/ 2 ) queries to distinguish these two systems with 
non-negligible advantage. 

The Iterated Even-Mansour Cipher. The Even-Mansour cipher was re- 
cently generalized in a very natural way by Bogdanov et al. [2J as follows: given 
t public permutations P\. . . . ,P t on (0, 1}”, encrypt x by computing: 

y = h © Pt{k t - 1 © P t - 1 (- • • Pi(k 0 © x) ■ ■ • )) , 

where ko, . . . , k t are t+ 1 keys of n bits. They used the moniker (first coined 
in jZj) key alternating cipher for this construction, but we will prefer the name 
iterated Even-Mansour cipher in this paper to emphasize that we work in the 
random permutation model. We will refer to t as the number of rounds of the 
construction. 

The main result of 0 is a proof (again, in the random permutation model 
for Pi, . . . P t ) that the iterated Even-Mansour cipher with t > 2 rounds is secure 
( i.e ., indistinguishable from an independent random permutation) up to 0(N 2 / 3 ) 
queries (where N = 2"). They also gave a distinguishing attack (in fact a key- 
recovery attack) requiring f?(Af t ^ t+1 ^) queries. Hence, their analysis is tight for 
t = 2, but they left the security gap for t > 2 as an interesting open problem. 

Our Contribution. In this work, we strengthen the security bounds of 0. We 
obtain two distinct results depending on which type of adversaries we consider. 
For non-adaptive chosen-plaintext (NCPA for short) adversaries, we prove that 
the iterated Even-Mansour cipher with t rounds is secure up to 0(Ab/( t + 1 )) 
queries. Given that the attack described by 0 falls into this category of adver- 
saries, this is tight up to constant factors. Tough this type of adversaries was 
not explicitly considered by we note that this improves their general bound 
as soon as t > 3. 

For adaptive chosen-plaintext and ciphertext (CCA for short) adversaries (i.e. 
the most powerful ones in terms of how queries may be issued to the system), 
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we prove that the iterated Even-Mansour cipher with t rounds is secure up to 
0{N t /( t+2 ' 1 ) queries when t is even. When t is odd, we get the same bound as for 
t — 1 (since it is clear that adding a round to the construction cannot improve 
the advantage of a distinguisher). Our bound becomes better than 0(JV 2 / 3 ), 
therefore improving j^’s result, for t > 6. In particular, for t = 6, we obtain an 
improved security bound of 0(7V 3 / 4 ) queries. Our findings are summarized in 
Table QJ 

Our Techniques. Our proof strategy is very different and much simpler than 
the one of 0 (the counterpart of which is that for the interesting case of CCA 
adversaries, we improve their results only for t “large”, where large means at 
least 6). One of the main ingredient of our proof is a well-known tool of the the- 
ory of Markov chains, namely the coupling technique. Indeed, a crucial step of our 
proof is to upper-bound, for any possible tuple of plaintext queries (x 1 ,. . . ,x qe ) 
to the iterated Even-Mansour cipher, the statistical distance of the outputs of 
the cipher to the uniform distribution, conditioned on some partial informa- 
tion about the inner permutations Pi, ... ,P t (namely equations of the form 
Pi(a) = b ) that was gathered from the queries to these permutations. The out- 
puts of permutations P%,i = 1, . . . , t, when computing the ciphertexts for inputs 
(a; 1 , . . . , x Qe ), can be seen as the state of a Markov chain, so that we can refor- 
mulate the problem as studying how quick the distribution of this Markov chain 
converges to the uniform (as a function of the number of rounds). The coupling 
technique is one of the most efficient way to analyze this convergence rate (often 
named the mixing time of the Markov chain), and this is exactly the technique 
we adopt. Couplings were previously used in cryptography by Mironov m to 
analyze the RC4 stream cipher, and more recently by Morris et al. HI] to study 
maximally unbalanced Feistel networks and by Hoang and Rogaway m who 
generalized the results of Ed to many variants of the Feistel construction. In 
fact, our analysis was strongly inspired by the works of jl7H2j . 

However, the coupling technique only enables to treat adversaries choosing 
their queries to the cipher non-adaptively. To leverage the result from NCPA- 
security to CCA-security, we use a composition strategy which is very similar to 
what is often referred to as the “two weak make one strong” technique 1 1 41 1 1)| . For 
“classical” pseudorandom permutations ( i.e . not build from ideal primitives as 
the Even-Mansour cipher), this strategy enables to prove the following: if {Fk} 
and {Gfe/} are two permutation families secure against NCPA attacks (with 
upper-bounds resp. sf and eg on the advantage of any NCPA-distinguisher), 
then the composition {G^, 1 o Fk} is secure against CCA attacks (with advantage 
upper-bounded by ep + eg). This was proved by Maurer and Pietrzak [13] up 
to logarithmic factors and then refined by Maurer et al. Id. in the formalism of 
random systems. However, subtle complications appear when trying to use these 
results directly because of the additional inner permutation oracles Pi, . . . , P t , 
so that we prefer a more direct approach, very similar to the “H coefficients” 
technique of Patarin [TH| ■ 
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A Caveat. We warn that the value of our results is similar to security proofs in 
the random oracle model |2j, meaning that they offer no guarantee once the inner 
permutations are instantiated with real, standard permutations [5j. They show 
however that any attack beating our bounds cannot use the inner permutations 
as black-boxes. 

Table 1. Summary of our results. The NCPA (resp. CCA) column gives the constant c 
such that the iterated Even-Mansour cipher is secure up to N c queries against NCPA- 
distinguishers (resp. CC A-distinguishers) . Gray cells indicate when we improve the 
N 2 / 3 bound of 0. The last column gives, for n = 128, the log in base 2 of the minimal 
number of queries a CCA-distinguisher has to make to have advantage at least 1/2 
in distinguishing the cipher from random (we only give this number when our bound 
improves the one of H]). 


| t | NCPA | CCA | CCA (n = 128)1 


2 

2/3 

1/2 


3 

3/4 

1/2 


4 

4/5 

2/3 


5 

5/6 

2/3 


6 

6/7 

3/4 

93 

7 

7/8 

3/4 

93 

8 

8/9 

4/5 

100 


Related Work. We focus on security proofs in this work, but we stress that 
quite a few papers explored attacks (mainly key-recovery ones) against the Even- 
Mansour cipher. Daemen jS] gave a differential-style attack requiring q p (direct) 
chosen queries to P and q e chosen plaintext queries to the cipher, with q p q e = 
12(2") (hence the total query complexity is minimized for q p = q e = 12(2"/ 2 )). 
Later, Biryukov and Wagner |3J gave an attack requiring 12(2"/ 2 ) queries to 
both P and the cipher, but allowing to use known plaintexts rather than chosen 
ones. However, their method does not allow any trade-off between queries to P 
and the cipher as is possible in Daemen’s attack. Recently, Dunkelman et al. jS] 
refined the work of j3j by giving a known-plaintext attack where such a trade-off 
is possible, thereby providing an optimal attack on the Even-Mansour cipher. 

On the provable-security side, Gentry and Ramzan m showed that the Even- 
Mansour cipher remains secure when the random permutation oracle P is re- 
placed by a Feistel construction with four rounds, where the round functions are 
public random function oracles. 

Open Problems. Our work settles the case of non-adaptive chosen-plaintext ad- 
versaries; there remains however a gap for adaptive chosen-plaintext and cipher- 
text attacks between the proven bound of 0{N t / ) queries and the best attack 
requiring l2(Af*^ t+1 )) queries. The two practically appealing cases where all keys 
are identical (as was for example recently proposed in the blockcipher LED El), 
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and where all inner permutations are identical, also remain interesting directions 
of research. It may even be possible that using both identical keys and a single 
inner permutation provides some level of security greater than 2"/ 2 Q 

Organization. In Section 0 we introduce the general notation, formally de- 
fine the adversarial model, and give the necessary background on couplings. In 
Section EJ we prove our main result on the statistical distance of the outputs 
of the iterated Even-Mansour cipher to the uniform distribution using a cou- 
pling, which enables us to treat NCPA-adversaries. In Section 01 we deal with 
CCA-adversaries. 


2 Preliminaries 

2.1 General Notation 

In all the following, we fix an integer n > 1. We denote I n = {0,1}" the set of 
binary strings of length n and N = 2". Given an integer q > 1, we denote (T n )'* q 
the set of all sequences of pairwise distinct elements of I n of length q. Given 
integers gi, . . . , q t we denote (X n )* qi ’-’ qt = (I n )* qi x • • • x (X n )* qt . We denote 
(N) q = N(N— 1) ■ • • (AT — g+ 1) the falling factorial. Note that \(I n )* q \ = {N) q . 
We denote [i\ j] the set of integers k such that i < k < j. 

The set of permutations on I„ will be denoted V n . Given P G V n and two 
sequences x = (x 1 , . . . , x q ) and y = (y 1 , . . . , y q ) of ( I n )* q , we will write P{x) = y 
to mean that P(x % ) = y l for i = 1, . . . , q. Given a tuple of permutations P = 
(Pi, . . . , P t ) G (PnY and two sequences a = (ai, . . . , a t ) and b = (fq, . . . , b t ) of 
(X n )* qi, ' ,qt , with ai = (aj, . . . , af) and b{ = (bj, , bf), we will write P(a) = b 
to mean that Pj(a*) = bi for i = 1, . . . , t (i.e. Pi(ai) = &{ for j = 1, . . . , %). 

Given a value k G {0, 1}", ®k denotes the mapping x x © k from {0, 1}" to 
itself. Fix an integer t > 1. Let P = (Pi, . . . , P t ) be a tuple of permutations on 
{0, 1}". Then the iterated Even-Mansour cipher associated with P is the cipher 
with message space {0, 1}" and key space ({0, l}")*+ 1 where the permutation 
associated with key k = (ko, . . . . k t ) is defined as (see Fig. Q): 

EM P>fc = © fct o P t o © fet _ 1 o • • • o © fcl o Pi o © fco . 

We denote = {Vn) 1 x (I n ) t+l . An element ( P,k ) of Qt names a tuple of 
permutations and a key for the resulting Even-Mansour cipher. 


2.2 Distinguishers 

We consider distinguishers interacting with systems constituted of t + 1 per- 
mutations. A query to such a system is a triplet ( i,b,z ) where i G [l;t + 1] 
names which permutation is being queried, b is a bit indicating whether the 

1 Note however that, as observed by 0|, using P and P -1 for the construction with 

t = 2 rounds causes the security to drop to 2 n/ ’ 2 , even with three independent keys. 
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Fig. 1. The iterated Even-Mansour cipher 

query is forward or backward, and z 6 {0,1}" is the actual query to the per- 
mutation. The goal of the distinguisher is to tell whether it is interacting with a 
tuple of t + 1 uniformly random and independent (URI for short) permutations 
(Pi, . . . , Pt,Q ), or with (Pi, . . . , P t , EMp jfc ) where (Pi, • • • , P*) are URI and EMp >fc 
is the Even-Mansour cipher associated with P = (Pi, . . . , P t ) with a uniformly 
random key k = (fco, . . . , k t ). In the following we will refer to the first t per- 
mutations of the system as the inner permutations, by opposition to the last 
permutation of the system (which may be an independent random permutation 
Q or the Even-Mansour cipher EMp *,) to which we will refer to as the outer per- 
mutation. A (qi,. . . ,qt,q e )- distinguisher is a distinguisher that makes at most 
qi queries to inner permutation P, for i = 1,. . . ,t and q e queries to the outer 
permutation. We will consider only computationally unbounded distinguishers. 
As usual we restrict ourself wlog to deterministic distinguishers that never make 
redundant queries and always make the maximal number of allowed queries to 
each permutation of the system. 

The way we define chosen-plaintext /-ciphertext and adaptive/non-adaptive 
distinguishers is very specific to the context of our work. The qualifier chosen- 
plaintext /-ciphertext will only refer to the queries the distinguisher is allowed 
to make to the outer permutation of the system (it will always be allowed to 
make both forward and backward queries to the inner permutations). As well, 
adaptivity will only refer to how the distinguisher is allowed to choose its queries 
to the outer permutation (it will always be allowed to choose its queries to the 
inner permutations adaptively) , and also to whether the distinguisher is allowed 
to query the inner permutations as a function of the answers received from 
the outer permutation. We now give a precise definition of the two types of 
distinguishers we consider: non-adaptive chosen-plaintext (NCPA) distinguishers 
and adaptive chosen-plaintext and ciphertext (CCA) distinguishers. 

Definition 1. A (qi , ... , q t , q e )-NCPA-distinguisher runs in two phases: 

1. in a first phase, it can only query the inner permutations (Pi, . . . , P t ). These 
queries can be adaptive, and both forward and backward queries are allowed. 
During this phase it makes exactly qt queries to Pi for % = 1 , . . . , t; 

2. in a second phase, it chooses a tuple of q e non-adaptiv ^ forward queries 
x = (a; 1 ,. .. ,x qe ) to the outer permutation of the system, and receives the 
corresponding answers. 

2 By non-adaptive we mean that all queries have to be chosen before receiving any 
corresponding answer from the outer permutation. However the choice of x may 
depend on the answers received from the inner permutations during the first phase. 
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A (cji, , q t . q e )-CCA-distinguisher is the most general one: it is allowed to make 
both forward and backward queries to all permutations of the system, in any order 
it wishes (in particular it may interleave queries to the outer permutation and 
to the inner permutations) . 

In all the following, the probability of an event E when V interacts with t+ 1 
URI permutations (Pi, . . . , P t , Q) will simply be denoted Pr* [E ] , whereas the 
probability of an event E when D interacts with (Pi, . . . , P f , EMp^), where 
P = (Pi , . . . , Pf) are URI permutations and the key k is uniformly random, 
will simply be denoted Pr[P]. With these notations, the advantage of a distin- 
guisher V is defined as | Pr[P(l") = 1] — Pr*['D(l”) = 1] | (we omit the oracles in 
this notation since they can be deduced from the notation Pr[-] or Pr*[-]). The 
maximum advantage of a (qi,. . . ,q t , g , e )-ATK-distinguisher against the iterated 
Even-Mansour cipher with t rounds (where ATK is NCPA or CCA) will be de- 
noted Adv|^([ t ](gi, . . .,q t , q e ). When considering distinguishes making at most 
q queries in total, we simply denote AdVg M ^ (q) . 

Remark 1. We warn that our NCPA-security notion should not be considered 
as interesting in itself, but rather as a preliminary step towards proving CCA- 
security. The reason why it is rather artificial is that once the distinguisher has 
received the answers to its queries to the outer permutation, it is not allowed 
to query the inner permutations any more. This is not satisfying since these 
permutations are public primitives, and hence adversaries should be allowed to 
query them in their entire discretion. 

2.3 Total Variation Distance and Coupling 

Given a finite event space f2 and two probability distributions p and v defined on 
f2, the total variation distance (or statistical distance) between p and v, denoted 
|| p — u\\ is defined as: 

xen 

The following definitions can easily be seen equivalent: 

||/i - ftf = ma x{p(S) - i/(S)} = max {1/(5) - p{S)} = max{|/r(5) - u{S)\} . 

A coupling of p and v is a distribution A on Q x Q such that for all x G 
E y enH x ,y) = K x ) and for a11 2/ e f?, J2 xen X(x,y) = v(y). In other 
words, A is a joint distribution whose marginal distributions are resp. p and 
v. The fundamental result of the coupling technique is the following one. For 
completeness, we provide the proof in Appendix 1X1 

Lemma 1 (Coupling Lemma). Let p and v be probability distributions on 
a finite event space f2, let X be a coupling of p and u, and let ( X , Y) ~ A 
{i.e. (X,Y) is a random variable sampled according to distribution X). Then 
\\p-o\\<Pr[X^Y]. 
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For the analysis of CCA attacks, we will rely on the following observation. 

Lemma 2. Let L2 be some finite event space and v be the uniform probability 
distribution on ft. Let p be a probability distribution on Li such that ||/u. — z^|| < e. 
Then there is a set S C fi such that: 

- |S|>(l- > /i)|/2| 

- Vx € S, p(x) > (1 — \fe)v(x ) 

Proof. Define S = {x £ Li : p(x) > (1 — y/ e)v{x )}. We will show that |/S| > 
(1 — yfe)\L2\. Assume for contradiction that \S\ < (1 —_y/F)\L2\, or equivalently 
|«S| > y/e\L2\, i.e. u(S) > yfe. By definition, for any x £ S, v(x) — p{x) > yfev[x). 
Consequently, 

u(S) - p(S) > yfev{S) > (y/s) 2 = e , 

a contradiction with \\p — v\\ <e. □ 

3 Security against Non-adaptive Distinguishers 

In this section, we start with dealing with NCPA-distinguishers. The crucial 
point will be to upper bound the statistical distance between the outputs of the 
iterated Even-Mansour cipher conditioned on partial information on the inner 
permutations (namely P(a) = b for some tuples a, b £ (I n )* qi ' -' qt ) and the 
uniform distribution on ( I n )* qe ■ We introduce the following important definitions 
and notations. 

Definition 2. Let qi , . . . , q t , q e be positive integers. Fix tuples a,b £ (l n )* qi ’-’ qt 
and x £ (l n )* qe . We denote p x (-\P(a) = b) the distribution of EMp jfe (a;) condi- 
tioned on the event P(a ) = b (Te. when the key k = (fco, • • . , kf) is uniformly 
random and the permutations P = (Pi,...,P t ) are uniformly random among 
permutations satisfying P(a ) = b). We also denote p* e = 1 /(N) qe the uniform 
distribution on ( X„)* 9e . 

We have the following expression for p x (-\P(a) = b). 

Lemma 3. Let a,b £ [T n )* q '- - qt and x £ (X„)* 9e . Then for any y £ ( T n )* Qe 
one has: 


. = = Bateil 

Proof. This follows easily from the observation that the number of (P, k) £ L2 t 
such that P(a) = b is \L2 t \/ 1=1 


The following lemma states that the advantage of a NCPA-distinguisher is upper- 
bounded by the total variation distance between p x (-\ P(a) = b) and p* e . This is 
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a classical result regarding the advantage of the best NCPA-distinguisher for a 
pseudorandom permutation, however we need to adapt it here to fit the random 
permutation model. 

Lemma 4. Let q \, . . . , q t , q e be positive integers. Assume that there exists a such 
that for any tuples a,b £ ( I n )* qi '—' qt and x € (I n )* qe , one has 

IM-I-P(a) =b)-n* e \\<a . 

Then Adv^ [t] (qi , . . . , q t , q e ) < a. 

Proof. Fix a (qi , . . . , q t , g e )-NCPA-distinguisher V. Such a distinguisher first 
queries the inner permutations (Pi, . . . , P t ). Let r be the resulting transcript, 
i.e. the ordered sequence of q\ 4 ... I q t queries with the corresponding answer 
(i, b, z, z r ), where i € [1; t] names which permutation is being queried, b is a bit 
indicating whether the query is forward or backward, 0 6 {0, 1}" is the actual 
query and z ' the answer. Let also <I> be the function that maps a tuple of permu- 
tations P = (Pi, . . . , P t ) to the transcript of the first phase of the attack when 
V interacts with (Pi, . . . , P t , *), where * is either an independent random per- 
mutation Q or EMp / :; (this is clearly irrelevant since V does not query the outer 
permutation during the first phase of the attack). We say that a transcript r is 
consistent if there exists a tuple of permutations P such that f i>(P) = r, and we 
denote P the set of consistent transcripts. Finally, from a consistent transcript r, 
we build the sequences <z(t),6(t) e ( l n )* qi ’ -’ qt as follows: let ( i,b,z,z ') be the 
j-th query and corresponding answer to P,; in the transcript. If this is a forward 
query (6 = 0), then we define a] = z and 6j = z'\ else, when this is a backward 
query (6 = 1), we define a j = z' and 6) = z. Note that for a consistent transcript 
r, A(P) = r iff P(a(r)) = 6(r). The number of consistent transcripts can be 
exactly determined: 

\n = ri(A0 9i • (i) 

i— 1 

This can be easily seen as follows. The first query of V is fixed in all executions. 
Assume wlog that this is a query to Pi. There are exactly N possible answer. 
The next query is determined by the answer received to the first query. If this 
is again a query to Pi, there are now N — 1 possible answers, whereas if this a 
query to P*, i ^ 1, there are N possible answers. This can be easily extended by 
induction to obtain the above claim. 

The tuple of non-adaptive plaintext queries x = (x 1 ,. . . , x Qe ) e (T n )* 9e of V 
to the outer permutation is a deterministic function of the transcript r of the 
first phase of the attack. Let denote the function which maps a consistent 
transcript r to the corresponding tuple of queries. The output of V is then a 
deterministic function of r and the answers y = (y 1 , . . . , y qe ) received from the 
outer permutation to the tuple of queries P(r). For any consistent transcript r, 
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we denote S T the set of tuples y such that V outputs 1 when receiving answers 
y to the queries !F(t). Then, by definition we have: 


Pr* [D(l n ) = 1] = J2 £ # {{P ' Q)G{Vn)t+1 
reryez T 


#(CP 

reryeE T 


= ££ 


i 

w*. nltW* ' 


^.${P) = T/\Q{*{T)) = y} 

\P n \ t+1 

: P(o(t)) = b(r) A QQF(t)) = y} 

\Vn \ t+1 


(2) 


Also, we have: 


Pr[Z>(l") = 1] = 

V- V- #{(-P- k) G /2 t : *(P) = r A EMj, fc (*(r)) = y} 

2- 2. |fi t | ■ w 

reryeZr 1 

We now use the assumption that, for all tuples a, b G (X n )* qi ’-’ qt and x G (I„)* 9e , 
one has ||/x x (-|P(a) = b) — < a. By Lemma 01 this exactly means that for 

all tuples a, b , x and any subset S C (I n )* qe , one has: 

#{(P , k) G fit : -P(n) = b A EMp^z) = y} _ y-v 1 ^ 

S wi/nUm, 

For any t G T we can apply the above inequality with (a, 6) = (a(r), 6(t)), 
a; = *F(t), and S = S T to get: 


V #{(P, Ai) € fit : P(a(T)) = b{T) A EM P , fc (g(r)) = y} 

v k w 

y iz r (N) qe nli(N) qi 

Combining Eqs. (|2HE]h and using that for a consistent transcript r, d'(P) = r 
iff P(a(r)) = b(r), we obtain: 

|Pr[©(l n ) = 1] - Pr*[2?(l") = If < £ t ° ■ 

re r lli=iC v J9i 

Finally, we deduce using Eq. (HJ) that the advantage of T> is less than a, which 
concludes the proof. □ 

The rest of this section is devoted to establishing an appropriate upper bound a 
for ||/z x (-|P(a) = b) — /z*J| as required to apply Lemma [TJ The following lemma 
can be regarded as the main contribution of this work. 


■ Hum,, 


(4) 
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Lemma 5. Let qi , . . . , q t , q e be positive integers. Fix tuples a,b £ (I n )* qi ’-’ qt 
and x £ ( I n )* qe . Then: 

|M-|P(a) 1| < 2 * Qe qi . 

Proof. Fix tuples a, b £ (T n )* qi, - ,qt and x £ ( I n )* qe , with x = (a; 1 , . . . , x qe ). For 
each £ £ [0; q e ], let (z 1 , . . . , z Qe ) be a tuple of queries such that z* = x l for i < £, 
and z 1 is uniformly random in {0, 1}” \ {z 1 , . . . , P -1 } for i > £. Denote vp the 
distribution of the tuple of q e outputs when EMp jfc receives inputs (z 1 , . . . , z Qe ), 
conditioned on P(a) = b. Note that u 0 = g,* e since for £ = 0 the tuple of inputs 
is uniformly random in (I n )* <le , and v Qe = g x (-\P(a) = b). Hence we have: 

IIM'I P{a) ~b)~ Pg e || = \\v qe - v 0 \\ < ^2 ll^+i - M\ ■ ( 5 ) 

1=0 

It remains to upper bound the total variation distance between vp + \ and vp, for 
each £ £ [0; q e — 1]. For this, we will construct a suitable coupling of the two 
distributions. Note that we only have to consider the first £ + 1 elements of the 
two tuples of outputs since for both distributions, the i-th inputs for i >1+1 
are sampled at random. In other words, ||z^+i — ve\\ = \\V(+i — ®J||, where u' e+1 
and u'f are the respective distributions of the £+ 1 first outputs of the cipher. To 
define the coupling of v' e+1 and i/ e , we consider the iterated Even-Mansour cipher 
EMp ifc , where P satisfies P(a) = b, that receives inputs x' = (x 1 , . . . ,x l+1 ), so 
that EMp /^a,-') is distributed according to ^ +1 . We will construct a second Even- 
Mansour cipher EM pqp, with inputs u — (u 1 , . . . , u e+1 ), satisfying the following 
properties: 

1) = x 1 for i = 1 , . . . , £, and v/ +l is uniformly random in {0, l}”\{u 1 , . . . , i/}: 

2) for i = 1, ...,£+ 1, if the outputs of the j-th inner permutation in the 
computations of ENp^ix 1 ) and EMp/ i fe/(u*) are equal, then this also holds for 
any subsequent inner permutation; 

3) P’ is uniformly random among permutation tuples satisfying P'(a) = b and 
k' is uniformly random in (X n ) t+1 . 

Note that properties 1) and 3) will ensure that EMp/^^u) is distributed according 
to v' v We warn that ( P',k ' ) will not be independent from ( P,k ), however this 
is not required for the Coupling Lemma to apply. The only requirement is that 
both ( P , k) and (P' , k') have the correct marginal distribution. 

We now describe how the second iterated Even-Mansour cipher is constructed. 
First, it uses exactly the same keys as the original one, namely k' = ( faj , . . . , k t ). 
In order to construct permutations P' (on points encountered when computing 
EMp' j fe/(u)), we compare the computations of EMp i fc(a;*) and EM p',*/(u®) for i = 
1, ...,£+ 1. For j = 1 , f, we define a:* as the output of Pj when computing 
EMpj c (x*), and similarly as the output of Pj when computing EMp/ >fe /(u*), i.e. 

x) = P j (k j -i © • • Pi(P 0 ko) ■ ■ ■ )) 

and u} = Pj{kj—\ 0 P;_i(- ■ ■ Piiu 1 0 ko) • • • )) , 
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We also let Xq = x l and u l 0 = u l . For j = 0 , . . . , t — 1 we use the following rules: 

i) if u l j ® kj G a,j + 1, then u* +1 = P' +1 (u* ® kj) is determined by the constraint 
P'(a) = b; 

ii) if u l j ® kj £ dj + 1 and Xj ® kj G %-+i, then we choose u} +1 = Pj +1 (u} ® kj) 
uniformly at random in {0, 1}" \ (bj+i U {/u) +1 . . . . , 

iii) if Uj ® kj ^ djj-i and a ;*■ ® kj £ a^+i, then we define u} +1 = a;} +1 , that is 

Pj+i(Uj ® kj) = Pjj-i(Xj ® kj). 

Property 2 ) can easily be seen to follow from these rules and the fact that the keys 
are the same in both ciphers. Since P is uniformly random among permutation 
tuples satisfying P(a) = b, so is P' . This follows from the fact that when using 
rule iii), Xj ® kj £ aj + i implies that Xj +1 is uniformly random in {0,1}” \ 
(6j_l_iU{a;]_|_ 1 , . . . , and hence u} +1 is uniformly random in {0, 1}"\ (6 J+ i U 

{Uj + j , . . . , it'fy}}) as well. This justifies Property 3 ). Hence, the joint distribution 
probability we created for the random variable (EMp^a/), EMp/^u)) is such 
that the marginal distributions of EMp/ c (.T / ) and EMpq*,/ (u) are respectively u' e+1 
and //{. We can now apply Lemma Q t o obtain: 

IN+i - ve\\ = IK+i - v'A\ < Pr [(a: t \ . . . ,x e t +1 ) ^ (uj, . . . ,u[ +1 )] 

where we used EMp^x*) = x\ ® k t +i and EMp/ ]fe /(u l ) = u\ ® k t + 1- Clearly, 
the rules (combined with the fact that u l = x % for i = 1 ,...,£) imply that 
Uj = xj for i = 1 , . . . , £ and j = 0, ... ,t, so that the above expression simplifies 
to |fy^+i — Vf\\ < Pr[x{ +1 7^ u\ +l ]. Hence, we are left with the task of upper- 
bounding the probability not to equate Xj +1 and u* +1 in any of the t rounds. 

Consider the first round. Unless we have Uq + 1 ® fco G ai or x/ 0 +1 ® ko G oi, 
we will use rule iii) so that we will have u\ + = x\ +l . Since the size of ai is 
qi, and ko is uniformly random, we see that Pr[x{ +1 7^ n{ +1 ] < 2q\/N. Assume 
now that x^ +1 ^ Uj +l for some j G [ 1 ; t — 1 ]. As in the preceding case, unless 
u^ +1 ® kj G a j+ 1 or x^ +1 ® kj G a j+1 , we will have u^\ = x^\, so that Pr[xj+{ ^ 
7^ u j +1 ] — 2 g ?+ i/iV. Using a chain of conditional probabilities, we get: 



2 ot = gf n!=igi 
N N* 


Finally, using Eq. we see that 



as claimed. 


□ 


Remark 2. It can easily be checked that the final key kt does not play any role 
in the proof of Lemma 0 Hence it also holds for iterated Even-Mansour cipher 
without the last key. 
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Remark 3. The proof of Lemma 0 can be straightforwardly extended to handle 
distinguishers that are allowed to make both forward and backward queries to 
the outer permutation, in a non-adaptive way (such adversaries could be named 
NCCA). However, notations become quite cumbersome, so that we omit the 
details. 

Combining Lemmata E| and 0 we obtain the following theorem. 

Theorem 1. Let q\, ... ,q t . q e be positive integers. Then: 

Adv f A'ht] (9i , - • • , 9t, 9e) < 2* • 

In particular, for any positive integer q: 

,a t+1 

Adv £7W[t] (9) < 2 ~ • 

This remains true for the iterated Even-Mansour cipher where the last key k t is 
omitted. 

More concretely, the iterated Even-Mansour cipher with t rounds achieves NCPA- 
security up to N~ queries. This is optimal (neglecting constant factors) con- 
sidering the attack described in 0 . 


4 From Non-adaptive to Adaptive Distinguishers 


In this section, we turn to the case of CCA-distinguishers. For this, we will need 
the following refinement to Lemma 0 which relies on a stronger assumption on 
the distribution of the outputs of the iterated Even-Mansour cipher. 


Lemma 6. Let q-\, ... ,q t , q e be positive integers. Assume that there exists /3 such 
that for any tuples a,b e (I n )* qi ’ -’ qt and x,y £ ( I n )* qe , one has 


Pr \P{a) = b A EMf> >fc (a;) = y]> 


1-/3 

( A )ge 


Then Adv^[t](9i, ■ • ■ , 9t, 9e) < P- 

Proof. The proof is very similar to the one of Lemma El Fix a (91, • • • , 9t, q e )- 
CCA-distinguisher D. Let r be the transcript of the interaction of T> with the 
system of t + 1 permutations, i.e. the ordered sequence of qi + . . . + qt + q e 
queries with the corresponding answer ( i,b,z,z' ), where i 6 [l;f + 1] names 
which permutation is being queried, 6 is a bit indicating whether the query is 
forward or backward, 0 6 {0, 1}” is the actual query and z’ the answer. Let 
also 9* be the function that maps a tuple of permutations (P,Pt+l) G {Vn) t+l 
to the transcript of the attack when V interacts with (P,Pt+i)- We say that 
a transcript is consistent if there exists a tuple of permutations (P, Pt+i) such 


An Asymptotically Tight Security Analysis 291 


that ${P, Pt+ 1 ) = r, and we denote r the set of consistent transcripts. Finally, 
from a consistent transcript r, we build the sequences <z(t),6(t) G (I n )* qi ’ -’ qt 
and x(r),y(r) G (X n )* 9e as follows. For i = let ( i,b,z,z ') be the j-th 

query and corresponding answer to Pi in the transcript. If this is a forward 
query (6 = 0), then we define a\= z and 6? = z'\ else, when this is a backward 
query (6 = 1), we define a] = z' and b] = z. Similarly, let (t + l,b,z,z') be 
the j - th query and corresponding answer to the outer permutation Pt+i in the 
transcript. If this is a forward query (6 = 0), then we define x J = z and y 1 = z'\ 
else, when this is a backward query (6 = 1), we define = z' and y J = z. 
Note that for a consistent transcript r, <£(P, Pt+i) = r iff P(a(r)) = 6(r) and 
Pt+i ( x ( T )) = y( T )- 

The output of V is a deterministic function of the transcript. We let E denote 
the set of consistent transcripts r such that V outputs 1 when the transcript is 
r. Then, by definition we have: 

Pr*[Z?(l") = 1] = E #{(P ’ Q)g( ^p WQ)=T} 

V #{(P,Q) g (Vn) t+1 : P(a(r)) = 6(r) A Q(x(r)) = y(r)} 

= y — 5 — • ( 6 ) 


Also, we have: 

P rP (I ") = 1] - |«Hli»hll 

= ^Pr[P(a(r) = i>(T)AEMp, t Mr)) = #(T)] . (7) 

res 

Using the assumption and Eq. (0- we see that: 

PrIJ,(I ”' “ 11 - i w,nlE, - (1 - fflPr ’ [J,(1 ” ) = 11 ■ 

so that Pr*['D(l") = 1] — Pr['D(l") = 1] < /3. Applying the same reasoning to 
the distinguisher V which outputs the negation of V's output, we obtain 

(1 - Pr* [D(l n ) = 1]) - (1 - Pr[2>(l") = !))</?, 

which implies that the advantage of T> is at most /3. This concludes the proof. □ 

We will now derive an appropriate bound /3 refining Lemma El by doubling the 
number of rounds of the construction and using Lemma |21 


292 R. Lampe et al. 


Lemma 7. Let t be an even integer and t' = i/2. Let q\. , q t , q e be positive 
integers. We denote: 


J ot' & nUl Qi 

N t> and “ 2 = 2 JfU 


t ' ge n <= l gi 


ai = 2* ■ 

I7ien /or any tuples a,b £ (X n )* qi ’—’ qt and x, y £ ( I n )* qe , one has 
Pr [P(a) =b A EMp j.(x) = y] > 1 t ^ , 

W qe n,=i(tf)« 


where /3 = 2(^/ai + y/af). 

Proof. First, we slightly modify how the Even-Mansour cipher with 2 1' rounds is 
defined in order to write it as the composition of two Even-Mansour ciphers with 
t' rounds. For this, we write the middle key k t > between permutations Pf and 
Pf+i as the xor of two independent keys k and fc 2 , and we redefine EMpj ; . where 
P = (Pi,..., P 2t ') 6 (P„) 2t ' and k = (k 0 , . . . , k t >-i, *&, k 2 ,, k t > +1 , . . . , k 2t ') € 
(X„) 2 *'+ 2 , as: 

EMp.fc = ®fe at , o P 2V o e***o Pf'+l ° ®fc2, ° 

EM P2.*2 

©kj, o P t , O • ■ ■ O ® fcl O Pi O ® feo . 

EMpi.fel 


Clearly, this does not change the quantity Pr[P(a) = b A EMp^fy) = y\ since 
kj, ® k 2 is uniformly distributed when k and k 2 are. This enables to write 
EMp jfe = EM P2 ^ 2 oEM Pi ^, where Pi = (Pi, . . . , P v ), P 2 = (P t /+i, . . . , P 2t '), 
ki = (fco» • • ■ i kt’-i,k\,), ki = (kf, , fct'+i, . . . , kit 1 )- In the following we denote 
n 2t , = (P„) 2 *' x (I n ) 2t '+ 2 . Note that \Q 2 f\ = |^t'| 2 - 

Fix tuples a, b £ ( I n )* qi ’ -’ qt and x, y £ (l n )* qe . We denote ui = (an, . . . , a t >), 
hi = (ut'+ 1) ■ ■ • , «2 1'), b\ = (bi , . . . , 6t'), and b 2 = (fy'+i, . . . , b 2t '). We will apply 
Lemma 0 independently to each half of the cipher EM P| ^ and EM P2 Consider 
the first half EMp^. By Lemma 0 we have ||/4(-|Pi(ffli) = «l) — RgJ| < aq, 
where /r* (• |Pj (a/) = bf) is the distribution of EM P| j ;] (x) conditioned on Pi(di) = 
b i. Hence Lemma 0 ensures that there is a subset S x C (J„)* 9e of size at least 
(1 — y/aT)(N) qe such that for all z £ S x : 


pl{z\Pi{ai) = h) = 

> (1 - y/af)- 


#{(-Pi,fei) G Q t ' '■ Pi(ai) = h A EM Pi ^ (x) = z } 

iflt'i/n£=i(jv)« 


( N) qe 
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Applying a similar reasoning to the distribution fj,y(-\P 2 (a 2 ) = b 2 ) of EM^ 1 ~ (y) 
conditioned on P 2 (a 2 ) = b 2 , we see that there exits a subset S y C of size 

at least (1 — y/u 2 )(N) qts such that for all 2 G S y : 


nl{z\P 2 {a 2 ) = b 2 ) = 


#{(-P 2 , ks) € Of ■ Pifa) = b 2 A EM 1 - (y) = z } 


1^'l/nU+i W qi 


> (1 — yfolf) 


(N) qe 


We can now lower-bound the number of ( P,k ) G 0 2 f satisfying P(a) = b and 
EM.pj c (x) = y by summing, over all intermediate values 2 6 S x fl S y , the product 
of the number of (Pi,fci) G Of satisfying Pfydi) = bi and EM Pi g i (x) = z 
times the number of ( P 2 ,k 2 ) G Of satisfying P 2 (a 2 ) = b 2 and EM Pa = V- 
Combining the two above equations yields: 


#{(P, k) G 0 2V : P(a) = b A EM^Cz) = y} > 

\S x nS y \(l-^)(l-^)\Of\ 2 

Finally, noting that |5 X fl S y \ > (1 — y/af — y/a 2 )(N) qe , dividing both terms by 
\Of\ 2 = 1 0 2 f | , and using 

(1 - y/al - ^/a 2 )(l - > 1 - 2(y/ai + ^ l /a 2 ) , 


we obtain: 

Pr[P(a) = b A EM P , fe (:r) = y] > * ^ * 

{N) qe n:=iW 9i 

with /3 = 2( A /bTr + y/a^), which concludes the proof. □ 

Combining Lemmata El and 0 we finally obtain our main theorem. 

Theorem 2. Let t be an even integer and t' = t/2. Let q\, ... ,q t . q e be positive 
integers. Then: 


Adv £A1[t](9l,---,9t,ge) < 



2 t,+2 g e nU +1 <h 

w 


1/2 


In particular, for any positive integer q: 


For odd t, we have Adv 
with t — 1 . 


Ad vS% w ( ff ) < . 

C £M[t] — Adv fXi[t-i]) so that we can use the above bounds 


More concretely, the iterated Even-Mansour cipher with t rounds achieves CCA- 
security up to jV *+2 queries. 
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A Proof of the Coupling Lemma 

The original statement and proof of the Coupling Lemma is due to Aldous pQJ. 
Here we follow closely a proof by Vigoda@ 

Let A be a coupling of /i and v, and (X. Y ) ~ A. By definition, we have that 
for any z £ w, A (z, z) < min{/r(z), v(z)}. Moreover, Pr[X = Y] = Ylzen z )• 

Hence we have: 

Pr[X = Y]<£min{ M (z),^)} . 
zen 

Therefore: 


Pr[X^Y]>l-£min{/^),^)} 

zeo 

= Y (M*) - min{/x(z), v(z)}) 
zen 

= y (M*) - K*)) 

z€.Q 

= max{/i(5) - z/(5)} 

= IIm - v \\ ■ 


Available from www . cc . gatech . edu/~vigoda/MCMC_Course/MC-basics . pdf 
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Abstract. Among various cryptographic schemes, CBC-based MACs 
belong to the few ones most widely used in practice. Such MACs iterate 
a blockcipher Ek in the so called Cipher-Block-Chaining way, i.e. C% = 

Ex(Mi © Ci_i) , offering high efficiency in practical applications. In the 
paper, we propose a new deterministic variant of CBC-based MACs that 
is provably secure beyond the birthday bound. The new MAC 3kf9 is 
obtained by combining / 9 (3GPP-MAC) and EMAC sharing the same 
internal structure, and so it is almost as efficient as the original CBC 
MAC. 3kf9 offers 0(^r + pr) PRF-security when its underlying n-bit 
blockcipher is pseudorandom with three independent keys. This makes 
it more secure than traditional CBC-based MACs, especially when they 
are applied with lightweight blockciphers. Therefore, 3kf9 is expected to 
be a possible candidate MAC in resource-restricted environments. 

Keywords: MAC, Birthday Bound, CBC, Mode of Operation. 

1 Introduction 

1.1 Background 

Birthday Bound. In cryptography, birthday attack is a generic attack that 
exploits no specific properties within cryptographic schemes, but just takes the 
advantage of birthday paradox in probability theory. This paradox says, approx- 
imately 2 n / 2 independently random n-bit points will collide with a probability 
close-to-1, where 2"/ 2 is called the birthday bound 125125 1 . The birthday attack 
itself is not fatal to the practical security of cryptographic schemes, because 
people can choose long-enough security parameters to defend, e.g. by restricting 
the output length of hash functions to be no shorter than 224 bits 0 , or by pre- 
venting attackers from getting sufficient number of input-output pairs, to make 
this attack infeasible in recent years. 

However, being constrained by some particular software/hardware environ- 
ments, there still exist many actual applications using short security parameters. 
For example, the 64-bit blockcipher KASUMI is currently a standard algorithm 
in mobile communication systems 0 . With the rapid developments of Internet 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 296-gT3] 2012. 
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of Things, several lightweight primitives have been proposed in recent years, e.g. 
present and PHOTON |1 H14j . These algorithms take small-size internal states 
and output values, usually are much easier to be realized in software and require 
smaller area in hardware, offering better performance than normal-size ones. 
Unfortunately, their small sizes imply vulnerability when they are used with 
traditional modes of operation, most of which are only secure within the birth- 
day bound 1 19121 . To ensure practical security in such cases, those modes have 
to be combined with stateful or random values, or to limit the lengths of their 
input messages, or to update secret keys frequently, resulting in inconveniences 
and security risks if misused. 

MAC. Message Authentication Code is a widely-used cryptographic scheme for 
data integrity protection and data origin authentication. Practical applications 
usually require them to be not only secure (outputting unpredictable tags for 
new messages) but also efficient. A common way to design a MAC algorithm is 
to iterate a blockcipher E : ICe x {0, 1}" — >- {0, 1}" in the Cipher-Block-Chaining 
(CBC) manner. That is, in each step, a new chaining value Cj is obtained by 
encrypting the XOR result of the current message block Mi and the previous 
chaining value C,;_i , i.e. Ci = Epc(Mi ® The CBC structure is so com- 

mon in the design of many cryptographic schemes that it has been considerably 
studied for many years |8I2 71911 fi!24| . 

Up to now, many excellent CBC-based MACs have been proposed, e.g. EMAC, 
XCBC, OMAC, CMAC and GCBC )9 7191 Itil-l 1911 . Besides, PMAC takes a fully 
parallelizable construction and can offer extremely high speed in parallel envi- 
ronments m- All of the above MAC algorithms are deterministic (needing no 
stateful or random values), and provably secure when their underlying block- 
cipher is assumed to be a pseudorandom permutation (PRP). However, their 
security bounds all fall within the birthday bound, and can not be further im- 
proved because there exist birthday attacks on them, i.e. the birthday bound is 
tight for them |19t2j . 

There are also a few CBC-based MACs with provable security beyond the 
birthday bound. For example, RMAC replaces the second key in EMAC by 
XORing its first key and a random value f 18121 . and MAC-R1 and MAC-R2 inject 
n-bit randomness into the internal states of CBC-based MACs m- Obviously, 
their high security relies on not only the PRP security of blockciphers but also 
the randomness of the injected values. 

In fact, all the deterministic blockcipher-based MACs fall within the birthday 
bound until Yasuda shows algorithm 6 in the ISO standard is an exception, 
conditioned on some restrictions on messages In the same paper, Yasuda 

also introduces SUM-ECBC to reduce the key size in algorithm 6, by XORing 
the results from two CBC-based MACs, providing half of the efficiency that 
normal CBC-based MACs offer in serial implementations (rate 20) . On the other 
hand, Dodis and Steinberger build a MAC from unpredictable blockciphers, with 

1 For each message of l blocks long, it has to call the underlying blockcipher roughly 

21 times. 
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security beyond the birthday bound, but pay by very high efficiency cost H2J- 
Very recently, Yasuda proposes PMAC_Plus that improves PM AC beyond the 
birthday bound inn. By pre-calculating sufficiently large number (as many as 
the number of message blocks) of masks, this MAC would provide high efficiency 
due to the fully parallelizable structure in PMAC and rate-1 design. 

3GPP-MAC. To promote the global system for mobile communications, the 
3rd Generation Partnership Project (3GPP) proposes f9 as its first MAC algo- 
rithm, which is based on blockcipher KASUMI and produces 32-bit tags 0 . / 9 
inherits the structure of original CBC MAC, but in the end encrypts the sum 
of all chaining values, other than the last chaining value, to obtain the tag. The 
analysis for / 9 tends to be tough due to this particular feature in. Knudsen 
and Mitchell are the first to give birthday attacks on / 9, which need 2^ n+1 ^ 2 
known (Message, MAC) pairs and 2 n,/2+1 chosen (Message, MAC) pairs to make 
a forgery against / 9 without truncations |2()j . Then, Iwata and Kohno proved 
that when KASUMI is secure against a special kind of related-key attacks (RK- 
PRP), a generalized version of / 9 (named with / 9') is PRF-secure within the 
birthday bound [Ej. This implies the previous birthday attack is the best one 
without knowledge of internal information. 

Despite the fact that the birthday attacks on MACs need on-line invocations, 
making it much more harder than those on hash functions (needing only off- 
line computations), people still take several countermeasures for large enough 
security margin. For example, in the practical applications of / 9, it has been 
demanded that each message should be prepended with a fresh value, the length 
of messages should be no longer than 20000 bits, the secret key should be changed 
after each invocation, and the outputs should be truncated |t>ltij . 

1.2 Our Work 

In this paper, we attempt to design a rate-1 CBC-based MAC with provable 
security beyond the birthday bound. A direct application of such a scheme is to 
enforce the security level of current CBC-based MACs, especially in the situa- 
tions where small-size (lightweight) blockciphers are used, e.g. 3GPP and smart 
cards. Another application is to make it serve as a highly-secure pseudoran- 
dom number generator for various protocols, which therefore would improve the 
security level of the latter. 

To do this, stateful or random values (e.g. counter, fresh) can help, but we 
would not consider them for practical convenience. Another possible way is to 
enlarge the size of internal states but still output normal-size tags. As for CBC- 
based MACs, however, their internal states have the same size as their underly- 
ing blockcipher, so one may want to use a large-size blockcipher in CBC-based 
MACs and truncate their outputs. Unfortunately, the efficiency of such a solu- 
tion will not be satisfying, because a large-size blockcipher usually runs no faster 
than a small-size one, not to mention many other costs, e.g. memory and area 
requirements. 
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Our starting point is / 9, in favor of its double-blocksize internal states pro- 
viding a possible chance to resist the birthday attacks. Inspired by the design of 
SUM-ECBC and PMAC_Plus, we append one more blockcipher invocation to the 
end of the / 9 structure, as illustrated in Fig. [3 The resulting MAC is named 
with 3kf9, for it enhances / 9 and needs three independent keys. From another 
point of view, it is also an extension of EMAC E2, ignoring Ek 3 and the last 
XOR operation. 



When authenticating messages, 3kf9 can start to work without stateful val- 
ues or message length information (on-line), requires no pre-computation and 
only two block-size memory for internal states, besides those for its underlying 
blockcipher. Specially, it needs no multiplications, comparing with PMAC_Plus. 
Therefore, 3kf9 will provide high efficiency in serial implementations. 

A more detailed comparison with related MACs is given in Table QJ 


Table 1 . Comparison among 3kf9 and its related deterministic MACs 



key size 

rate 

structure 

multi. 

upper bounds 

bBB. “ 

Ref. 

Alg. 6 in ISO std. 6 
SUM-ECBC 

6 k 

4 k 

2 

CBC 

none 

O(^r) or^ ^ 
restricted O(-^r) 

conditional 

m 

m 

PMAC_Plus 

3kf9 

3 k 

1 

parallel 

CBC 

41 - 1 

o(^£ + ^) 

yes 

This Work 

/9 

EMAC 

k c 

2k 

1 

CBC 

none 

o(¥) 

no 

15 

m 


a bBB stands for “beyond the Birthday Bound” . 
b It has been removed from the latest version ISO/IEC 9797-1:2011. 
c Its second key is obtained by Ka = K\ © KM, where KM is a non-zero 


fc-bit value. 
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1.3 Organization 

The rest of this paper is organized as follows. Section 2 introduces necessary 
symbols and 3kf9 specification. Section 3 gives our provable security analysis for 
3kf9, including security definitions, the main result and its proof. The proof will 
be completed in Section 4. In Section 5, we give some suggestions for practical 
usages of 3kf9. Finally, we conclude this work in Section 6. 

2 Symbols and Specification 

{0, 1}" is the set of all n-bit strings and (0, 1}* is the set of all strings. For strings 
a,b G (0, 1}*, a\\b is a concatenation of a and b , and |a| is its length in bits. If 
a, b have equal lengths then a © b is their bitwise XOR. Denote Perm(n) and 
Rand(n, n) as the sets of all permutations and functions over {0, 1}" respectively. 
Rand(*, n) stands for the set of all functions whose range belongs to {0, 1}". If 
A is a set, then #A denotes the size of set A, and x <— A means that x is chosen 
from set A uniformly at random. 

A message M can be alternatively seen as a bit string M G {0, 1}*. Then, 
by M <— M mod " we mean we append a single bit “1” to the end 
of M, followed by as many as n — 1 — \M\ mod n bit “0”s such that the length 
of the padded string is a multiple of n. For any such string M (|M| = nL), 
Mi M 2 ■ ■ ■ Ml <— Partition(M) means we break M into L successive n-bit 
blocks such that Mi||M 2 || • • • || Ml = M. 

MAC Algorithm 3kf9[£] 

Input: K 1 ,K 2 ,K 3 ^-JC, M G (0, 1}* 

Output: T G {0, l} n 

01. M •<— M||10" _1_|m| mod " 

02. Mi • • • Ml <— Partition(M) 

03. S<~ 0 n 

04. To <- 0” 

05. for l <— 1 to L do 

06. Xi <r- Yi - 1 ® Mi 

07. Yj <— E Kl (Xi) 

08. S <- S ® Yi 

09. end for 

10. T E K2 (Y l ) ® E K3 (S) 

11. return T 

Fig. 2. Specification of 3kf9 


For any message M G (0, 1}*, 3kf9 takes a blockcipher E : K E x {0,1}” —1 
{0, 1}” as its underlying primitive, calling it iteratively as specified in Fig. |2I to 
deal with M, and finally outputs T G {0, 1}” as a tag. If necessary, T can be 
truncated to be of some particular length less than n. 
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3kf9 needs three keys K t . K 2 and K 3 , each of which should be independently 
selected from K, = JCe uniformly at random. We use 3kf9[i?/c 1 , Ex 2 , Ejc a ] to 
stand for this MAC algorithm and we also write it as 3kf9[P] for short. 

3 Security Proof 

3.1 Security Definitions 

W need to introduce PRP/PRF definitions here, which are frequently used in 
the analysis of modes of operation for blockciphers |HI2 71911 HI24| . 

These two definitions focus on the randomness of a keyed function J'k- which 
is selected from a function family / : ICf x {0,1}* — > {0,1}" by selecting a 
random key K. To measure its randomness, fx is compared with a random 
function R<— Rand(*,n) (or a random permutation P<— Perm(n) if / consists 
of only permutations). 

The comparison is done as, informally, allowing adversaries (without knowing 
K) to query an oracle, which is either fx or R with equal probability. The 
oracle will answer with the corresponding outputs. After some number of queries, 
the adversaries are asked to tell what the oracle is. The precise definition is 
given by 

f Adv p / l (A) d =\Pr[K£/C f : A '* O = 1] - Pr[P £Rand(*,n) : = 1]|, 

| Adv prf (t, q, n) *= max{Adv^ rf (M)}, 

f Adv p f lp (A) d =lPr[K^/Cf : A^U = 1] - Pr[p£p e rm(n) : A p U = 1]|, 

1 Adv pTp (t, q, n) *= max{Adv^ rp (A)}, 

and the maximum is over all adversaries taking time at most t, making oracle 
queries at most q, whose total length is at most /x bits. If Adv^ rf (t, q, fi) (or 
Adv prp (i, q. n)) is sufficiently small, we say function family / is a pseudorandom 
function (PRF) (or a pseudorandom permutation (PRP)). 

It has been proved that a PRF is a secure MAC |B| - 


3.2 Main Results 

Let 3kf9[Pi , Pi , P3] stand for 3ki9[E Kl , E K2 , E Ks ] when blockcipher E with three 
independent keys are replaced by three independently random permutations Pi , 
P2 and P3, and we further write it as 3kf9[P] for simplicity. Then, the following 
theorem says that 3kf9[P] is a PRF with an upper bound beyond the birthday 
bound. 
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Theorem 1 (Main Theorem). For any computationally unbounded adversary 
A, after querying the oracle q times, with each query no longer than f max blocks, 
its advantage to distinguish 3kf9[P] from a random function P-^-Rand(*,n) is 
upper bounded by 

| Pr[^4 3kf9 [ p ] = 1] - Pr[.4 fl = 1]| < + 2 g 3 ^ax+g 3 |ax + 2g 3 Uax+2 g 3 

We conclude this theorem by the “coefficient H technique” initially proposed by 
Patarin |25l2tij . This method is a useful tool for proving pseudorandom properties 
of blockcipher structures and modes of operation, and it has been frequently used 
before 

To simplify our proof, we also adopt the framework used in the proofs for 
SUM-ECBC and PMAC_Plus |3QI31j . which separates the inputs to P> and P3 
into four cases. Taking advantage of some known results for CBC structure, / 9 
and sum of PRPs f9115l221 , the first three cases can be easily upper bounded. 
For the last case, we prove it by Lemma 1 in the next section. 

Proof. Since A is computationally unbounded, w.l.o.g. we assume A is a, deter- 
ministic algorithm, otherwise we can maximize A by running it over all pos- 
sible cases and choose the most powerful one. Based on this, the i-th query 
M l {M 1 ,M 2 , ■ ■ ■ , M l ~ 1 } A would make is fully determined by the previous 
i — 1 input-output pairs (M 1 ,! 11 ), (M 2 ,T 2 ), • • • , (M* _1 ,T* _1 ). Then, if we fix 
a g-tuple 2^ = (T 1 , T 2 , • • ■ , T q ), we know 

- all .A’s queries are uniquely determined, 

- the number of queries q is uniquely determined, and 

- the output of A (0 or 1) is uniquely determined. 

Denote Tseti = {(T 1 , T 2 , • ■ ■ ,T q )} is the set that contains all (/-tuple 2^ = 
(T 1 , T 2 , • • • , T q ) such that A outputs 1, and N = #Tseti. Then we have 

Evaluation for random function R. 

Prfyl* = 1] = £? eXsetl Pr [R{M*) = T\ i = 1, 2, • ■ ■ , q] = 

Evaluation for 3kf9[P]. 


p r [^ 3kf9 [P] = !] 

= 53 Pr[3kf9[P](AP) =T\i = 1,2,- •• ,q\ 

> 53 (Pr[3kf9[P] outputs q random values] x {~^) q ) 

- JL 


Pr[3kf9[P] outputs q random values]. 


( 1 ) 
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Denote CBC [Pi] as the internal structure of 3kf9[P], i.e. (Q. S) <- CBC[Pi](M), 
and 3kf9 [P)(M) = P 2 (Q) ® P 3 (S) = T, as in Fig. Q] In the following analy- 
sis, we do step by step for each i = 1, 2, • • • , q. Suppose in the previous i — 1 
queries, the i — 1 outputs T 1 , T 2 , ■ ■ ■ , T* -1 are independently random values. Let 
Domain[P 2 ] = {Q 1 , Q 2 , ■ ■■ , Q * -1 } and Domain[P 3 ] = {S' 1 , S 2 , ■ ■ ■ , S* -1 }. Then, 
for the Pth query M% its corresponding (Q\ S*) <— CBC[Pi](ilP) will definitely 
fall into one of the following four cases, 

Case A: Q l G Domain[P 2 ] and S l £ Domain[P 3 ], 

Case B: Q l Domain[P 2 ] and S* G Domain[P 3 ], 

Case C: Q l £ Domain[P 2 ] and S l £ Domain[P 3 ], 

Case D: Q l G Domain[P 2 ] and S l G Domain[P 3 j. 

For Case A, Black and Rogaway have shown that the probability for any two 
messages to collide in CBC structure (with an independent ending blockcipher 
invocation, e.g. EMAC, ECBC) is upper bounded by the birthday bound, i.e. 
Pr[Q J = Q l ] < 4 ( im ^ +1 ) (gee Lemma 3 in jHj). In such a case, we still have 
randomness for T’ = P 2 (Q*) ® P 3 (S*) because S i ^ Domain[P 3 ] and we can do 
lazy sampling P 3 (S l ). Since at this moment #Domain[P 3 ] <i— 1, the advantage 
to distinguish P 3 ( S l ) from a random value r G-{0, 1}" is no more than ^ . Then, 
the advantage to distinguish T l from r is upper bounded by (*{ 1 ) x 

For Case B, Iwata and Kohno have pointed out that the probability for any two 
messages to collide in / 9 (with an independent ending block cipher invocation) 
is also upper bounded by the birthday bound, i.e. Pr[S J = S'*] < + 2 m 

2imax ~ 2 » max+4 (S ee Lemma B.l in 1 1 f)i . and note that we apply a < 2/ max + 2 
and q = 2 here). Then, by lazy sampling for P 2 (Q*), we know the advantage to 
distinguish T l from r is upper bounded by (*~ 1 ) 2imax+ 2 t imajI+4 x i-\ 

For Case C, Lucks has proved that the advantage to distinguish T l = P 2 (Q 1 )© 
P 3 (S*) from r is upper bounded by (See t ^ ie P ro °f for 

Theorem 5 inE2j). 

As for Case D, we will show by Lemma 1 in the next section that Pr[3i G 
[1, q] : Case D occurs] < ql ^-t q + q 2 2 ™-Z ■ 

Denote [T l ^ r] as the event that T l is not an independently random value. 
Then, based on the none occurrence of Case D, we get 

Pr[r * r] 

= Pr[Case A] Pr[T l * r|Case A] + Pr[Case B] Pr[T'‘ * r|Case B] + 

Pr[Case C] Pr[T i * r|Case QJ 

, (i-l\ 4(l max + l) 2 ,, t-1 , (i- l\ 2ZL.C + 41 max + 4 t-1 , , 4(i-l) 2 

~ y 1 J 2" 2" ^ 1 J 2" 2- + 2 2 « 

_ (i- l) 2 (34ax + 5Zmax + 6) 

2 2n_1 
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This allows us to have 


Pr[3kf9[P] doesn't output q random values] 

Q 

< Pr[Case D] + ^ Pr[T* ^ r ] 

qlmax + q q 3 l max (* — l) 2 (^^max d" 5Z max + 6) 

- 2"— 2 + + ^ 2 ^ 

^ tf/max + 5 2g 3 4ax + tf^max + 2g 3 Z m ax + 2g 3 
- 2 n— 2 + 2 2n_1 


which implies Pr[^4 3kf9 I p l = 1] > ^ x (1 — e) by applying it to inequality CJ). 


Comparison 

By the above analysis, we can get 

Pr[.4* = 1] - Pr[.A 3kf9 [ p ] = l]<^-^x(l-e)<lxe<£. 

On the other side, if we define Tseto and by similar analysis we can get 
Pr[^ = 0] - Pr[M 3kf9[P1 = 0] < e, 

which implies (1 - Pr[^ = 1]) - (1 - Pr[M 3kf9[P1 = 1]) < e . Thus we get 
Pr[^ 3kf9 l p ; = 1] - Pr[M fl = 1] < e. 

Finally, we conclude 


| Pr[^ 3kf9 [ p l = 1] - Pr[M fl = $H < qln ^ + q 


2? 3 ^max + g'^max + 2c/ 3 Z m ax + 2g 3 
2 ^ 


□ 


Based on the main theorem, we can say that 3kf9[T] is a PRF if blockcipher E 
is secure. More precisely, we have 


Theorem 2. If blockcipher E : JC E x {0, 1}" -¥ {0, 1}" is a PRP, then 3kf9[.E] 
is a PRF for all adversaries, who make at most q queries, each of which is no 
longer than Z max blocks. That is, 


Adv^ [£] (t,g,M) < + 2g3 ^- +g3 ^- + 1 2g3imi 


3Advg P (t', <f , p'), 


where t' = t + 0(f), q' < g(Z max + 1), and p! < p + qn. 


4 Key Lemma 

The none occurrence of Case D implies the q pairs (Q l , S l ) (i = 1, 2, • • • , q) are 
free. By “free”, we mean for each i g [1, q], either Q l is unique in its corre- 
sponding sequence Q 1 , Q 2 , ■ ■ ■ , Q q or S' 1 is unique in its corresponding sequence 
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S' 1 , S' 2 , • • • ,S g . This property is closely related to the newly appeared Cover Free 
notion , which says the q outputs (N[, N%, ■ • • , N^) (1 < i < q) from a cover- 
free function should satisfy the following property. For each i, there exists at least 
one j G [l,tu] such that Nj is unique in its own subsequence ,Nj. 

Unfortunately, the internal structure CBC [Pi] can not satisfy the cover free prop- 
erty, when its outputs are made public. However, if adversaries can not get its 
internal states, CBC [Pi ] holds a similar property, as the following lemma says. 

Lemma 1. If Pi ,P% and P 3 are independently random permutations /romPerm(n), 
then for all computationally unbounded adversaries, who querying 3kf9[P] no more 
than q times, with each query no longer than Z max blocks, the probability for internal 
states (Q % , S z ) (i = 1, 2, ■ ■ • ,q) to satisfy Case D is upper bounded by 

Pr[3i G [1, q] : Case D occurs] < • 

In the following proof, we will prove an even stronger result. That is, all the pairs 
( Yf , S \ ) for l = 1, 2, ■ • ■ , L l and i = 1, 2, ■ ■ • , q are free with this probability, ex- 
cluding the trivial case that (Y]\ S\) = (Y/ ,S 3 ) with l < d for two different mes- 
sages M* and M 3 , which after being padded are written as M{\ | M \ 1 1 • • • 1 1 Mfy and 
M{ ||M|(| • • • ||M^ and having common prefix M{\ \ M%\ \ ■ ■ ■ \\M l d = M{ \\M^\\ • ■ ■ 
M d for some d < min{L\ LP}. To do this, we check the process detail of CBC [Pi] 
in dealing with the querying messages M 1 , M 2 , • • • , M q step by step, and record 
every Yf and S\ for l = 1, 2, • • • , L l and i = 1, 2, ■ ■ • , q with two sets YRange 
and SRange. By lazy sampling for Pi, we upper bound the probability for the 
events Yf 6 YRange and S\ G SRange to occur at the same time, and in the end 
we sum up all these probabilities to get the final result. 

Proof. For any q pairwise distinct queries M 1 , M 2 , • • • , M q , we use a program 
to show the process of CBC [Pi] in dealing with them, as in Fig. 03 To better 
analyze the target probability, we do lazy sampling for Pi. Furthermore, we 
denote three flags Zero , Cover and Bad. Zero is used to identify whether there 
exists Yf = 0", which may be easily used to undermine the freeness consistence 
of (YfjSf) for l = 1,2, ••• ,L l and i = 1,2, ••• ,q. Cover is used directly to 
identify the freeness of (Yf, Sj). Either [Zero = True] or [Cover = True] implies 
[Bad = True], so Pr[3i G [1, q] : Case D occurs] = Pr[Bad = True] < Pr[Zero = 
True] + Pr [Cover = True], 

Then, it is easy to get that Pr [Zero = True] < X]j£i ax+1 ^ 2 n -(j- 1 ) — ’ 

because for the q messages whose length is no more than / max + 1 blocks after being 
padded, we do no more than q(l max + 1 ) lazy sampling for Pi , and in the j-th sam- 
pling for a new output Y, Pr[Y = 0 n ] < 2 „_^._ ^ . Here we use g(Z max + 1) < 2 n_1 
to get the final bound. 

To upper bound Pr[Cover = True] for all (Yf. Sf), we will upper bound the 
probability for each lazy sampling that may result in the occurrence of [Yf G 
YRange A Sf G SRange] with Z = 1,2,--- , L l and i = 1, 2, • • • , q, and then sum 
up them. For better understanding the following analysis, we work on a simple 
case first (see Fig. 0] for an illustration), and then generalize it step by step. 
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00. Domain[Pi], Range [Pi], YRange, SRange <— cj>\ Zero, Cover, Bad «— False; 
for A’s i- th query M l * € {0, 1}*, do 

01. M* «- M i ||10 n_1_ l M *l mod Ml ■ ■ ■ M* Li t- Partition(M i ); 

02. Sb <r- 0"; Yd <- 0”; 

03. for l <— 1 to L 1 do 

04. x; <r- Yi_ x ® Mi- 

05. if X'l £ Domain [Pi] then Y t l <— Pi (_X|); 

06. else Y t l A{0, 1}” \ Range[Pi]; 

07. if Y t ‘ = 0 n then Zero <— True; Bad «— True; end if 

08. Range [Pi] <— Range [Pi] U {V] 1 }; 

09. Domain[Pi] <— Domain[Pi] U {X{}-, 

10. end if 

11. si *- st-i © Y i\ 

12. if Y{ G YRange and S] e SRange and 

13. $j < i s.t. MlUM^II • • • || Mi = M{\\M 3 2 \\ ■ • • ||M/ 

14. then Cover <— True; Bad «— True; 

15. else YRange <— YRange U {Y, 1 }: SRange <— SRange U {Si}; 

16. end if 

17. end for 


Fig. 3. A program showing the process of CBC fPil 


0 / 0/0 


Fig. 4. An insight view on the internal structure of CBC [Pi| 

4.1 The Most Common Case 

For a new input X\ £ Domain[Pi], we will choose a value Y t l <- {0, l}"\Range[Pi] 
by lazy sampling. Since Yf is a new output, it is definite that (Yf, Sf) is consistent 
with the previous pairs for freeness. However, if it happens that X ; * +1 = Y t l © 
Mf +1 e Domain[Pi], then event [17 + 1 e YRange] would occur, and the freeness 
consistence of pairs will rely only on the none occurrence of the event [.S' ; ' +1 G 
SRange] . Consider the following two subcases: 

1. Xl +1 = Xf. This implies Y ; * + i = Y t l and S\ +1 = Sj_ x , and thus undermining 

the freeness consistence. The probability for this event to occur is no more 
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than Pr[Xf +l = Xf] = Pr \Y{ = Xf © M/ +1 ] < < l*=r, where 

we assume #Range[Pi] < 2" _1 . 

2. Xf +1 G Domain[Pi] \ {Xf}. This implies YJ* ® Mf + 1 G Domain[Pi] \ {Xf}, 
and so Y t l has no more than #Domain[Pi] \ {Xf} choices. Choose any one 
such choice and fix Yj®, then Y t z +1 = P\(Xf +1 ) = Pi(Y l l 0 Mf +1 ) would be 
fixed, so is Sf +1 = On the other hand, the elements in SRange 

are £f=i Yj (1 £ d < U, 1 < j < *- 1) and ELi Y c (i < A < l ). 
Then, event [Sf +1 G SRange] implies no more than # SRange equations, 
all of which can be written as linear combination of K° equals to linear 
combination of message blocks (i.e. Mf +l © Mff +l or 0") with 0 < b < L a , 
1 < a < i - 1 or 0 < 6 < / — 1, a = i. Specially, note that Yf is not 
included here because Xf +1 G Domain[Pi] \ {Xf} implies Yjf can be written 
as Mf +1 © Xf +1 = Mf +1 © Y © M , where Y and M appear in the previous 
(Y,S) pairs and queries respectively (Y may be 0" if b = 0). Furthermore, 
notice that we have upper bounded Pr[F = 0"] by analyzing [Zero = True], 
so we can assume all Y£ ( b > 1) are non-zero values. Then, excluding the 
trivial case that two different messages would collide in their common prefix 
part, the possibility for each of these equations to hold is no more than 
1 / 2 r '— 1 , because all Y b a (6 > 1) are chosen by the previous lazy samplings, 
from a space with roughly 2" — #Domain[Pi] — # Range [Pi] — 1 < 2 n_1 
size. 2" — #Range[Pi] is naturally understood, “1” is respect to 0 n , and 
“#Domain[Pi]” is respect to the number of bad points that may result in 
Y b® M b+i e Domain [Pi]. So the linear combinations of Y b has at least 2 n_1 
possible values, and their real values are hidden in the internal structure 
CBC [Pi], not known by adversaries. So, in this subcase, 

Pr P?+i € Range [Pi] \ {Yf} A Sf +1 G SRange] 

= PrpEj^j G Domain[Pi] \ {Xf} A <S/ +1 G SRange] 

= Pr [Y? © Mjf + 1 G Domain [Pi] \ {Xf} A Sf +1 G SRange] (2) 

< Pr [YJ* © Mf +1 G Domain[Pi] \ {Xf}] x Pr[,S7 + i G SRange] (3) 

^ #Domain[Pi] \ {Xf } ^ # SRange 
“ 2" - #Range[Pi] X 2 n ~ l 

^ (#Domain[Pi]) 2 
— 2 2 " -2 ’ 

Where we apply #Range[Pi] < 2 n_1 . Notice that P\ [Xf] = Yf; <—{0, l} n \ 
Range[Pi] is a new lazy sampling, and Sf+i G SRange is only related with 
previous lazy samplings (Xf +l G Domain[Pi] \ {Xf} implying Yf = Mf +1 © 
Y ©M can be calculated by the previous pairs and queries), so the probability 
in |0) can be separated, thus we obtain inequality (0) . 

In this most common case, the probability for lazy sampling Pi [X/] = Yj* ^-{Q, 1}"\ 
Range [Pi] to undermine the freeness consistence is at most . 
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4.2 Generalized Case 1 

The above lazy sampling may further induce the occurrence of event [Xf +2 G 
Domain[Pi]], so the previous analysis is not complete, and here we generalize it 
in this direction. 

Suppose Pi[Xl) = Yf <- {0, l} n \ Range[Pi] induces series of occurrences, i.e. 
[Xi + 1 e Domain[Pi]], [ X\ +2 G Domain[Pi]], •••, [X ; *_,_ u _i G Domain[Pi]], with 
u < L l — l + 1, let us consider the probability to undermine the freeness con- 
sistence. First, we have Pr[X[ +1 = X}] < as before. Then, conditioned 
on X ; * +1 7^ those u — 1 events imply Y t l ® M ; * +1 G Domain[Pi] \ {X[ +1 } 
and Yi +a ® _M ; * +a+1 G Domain[Pi] for 1 < a < u — 2, and so Y ,* has at most 
#Domain[Pi]\{X;* +1 } choices. Choose any one such choice and fix Y{ , then S\ +a 
(0< a < u — 1) are also fixed. To keep freeness consistence, none of the events 
[Si+i+a e SRange U {S ; ®, S% +1 , ■ • • , <S)* + a}] (0 < a < u — 2) should occur. These 
events imply no more than (u— l)#SRange+ ( M ~ 1 K M ~ 2 ) equations, and each has 
a probability of 1/2” -1 to occur, with similar reasons given in the most com- 
mon case. So, here the probability for this lazy sampling to keep freeness consis- 
tence is upper bounded by pkrr + x 2 — < 

EI=i( 2 ^t + (#Domam|Pi]+tt-i) ^ Notice that u is the number of invocations to 
Pi related to lazy sampling Pi[X[] = l)*A{0,l} n \ Range [Pi]. 

4.3 Generalized Case 2 

Since we assume adversaries can make any q pairwise distinct queries M 1 , M 2 , 
• • • , M q , it is possible that some queries share a common prefix. Here we gen- 
eralize the probability for lazy sampling Pi[X ; *] = Y ( * t—{0, 1}" \ Range[Pi] to 
undermine the freeness consistence in this direction. 

Without loss of generality, we assume M l , M l+ l , • • • , M l+V ~ 1 share a common 
prefix (This can be reached by sorting the queries), and M) is the last block 
in their prefix. If X = Y{ +b ® M f+ b £ Domain[Pi] for all b G [0, u — 1], 
then Y^ b can keep freeness consistence. However, if 3b G [0, v — 1] s.t. X)+]' = 
Yi +b ®Ml+l = X; +b = Xl then the events [Y,!+ b = Y] i+b ] and [S;+ b = $^ b \ will 
occur, and thus undermine the freeness consistence. This probability is no more 
than Pr[36 G [0,i> — l],X ; *+ b = X ; * +b ] < Based on its none occurrence, we 
focus on the probability of [36 G [0, v — 1], X ; *+ b G Domain[Pi] \ {X/}]. Note that 
some particular choices of Y ; ® may result in several [X)+ b G Domain[Pi] \ {X ; *}] 
to occur at the same time, and the number of Y{ that induces v' such events is 
no more than ^Domain [Pi ] v /v' . W.l.o.g. we assume *U 1 , Xi+ 1 > ■ ■ ■ > xUi e 
Domain[Pi] \ (X)} for some v' G [1, v\. Choose any one such Y : l and fix it, 
then Yf +l , Y^ 1 , • • • , Y{+\ ~ 1 would be fixed, so are Sf +1 , I&1 , • • • , S;+i " 1 . The 
events 'S^f G SRange U {<S| +1 , S^ 1 , • • • , £;+i _1 }] (0 < j < v' — 1) imply no 
more than ?/#SRange + v ^ 2 ~ 1 ^ equations, with probability 1/2" -1 to occur 
each. Then it is not hard to get the probability to keep freeness consistence in this 
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case is no more than 


— Ylb= l(2"- 1 + 


(#Domain[Pi]+6 i) ^ Notice that v is the number of invocations to Pi related to 
lazy sampling Pi[Xf\ s= Y( A{0, l} n \ Range [Pi]. 


4.4 The Most General Case 


Based on the above, we generalize the most common case in two directions, as 
in Generalized case 1 and 2. 

The analysis here is the same as that in Generalized case 2, until Yf is fixed, 
and w.l.o.g. we assume Xj +1 , XJ+y. ■ ■ ■ ,X^ -1 £ Domain[Pi] \ {Xj} for some 
v' £ [l,v] occurs. Then we take Generalized case 1 into account. 

Suppose for X^ b (0 < b < v' — 1 ), its following calls to Pi .Y, • • • , 

Xi+t[b]~ l e Domain [Pi], with tt[6] < *- 1 + 1 . Then Sl +b , S^ b , ■ ■ ■ , 

can be fixed by Yf. The events [PjJf+i £ SRange U {<S ; * +6 , • • • , with 

0 < a < u[b\ — 2 and 0 < b < v' — 1 imply no more than X^=i (#SRange + w — 1 ) 
equations ( s = w[6]), with probability l/2 n_1 to occur each. Then we can get 


the probability for lazy sampling Pi [Xj] =F ; *-e-{0,l}" \ Range [Pi] to undermine 
the freeness consistence is at most ^=1 + x < 

X]^=i( 2 ^^t + ). Notice that s = U M is the number of 

invocations to Pi related to lazy sampling Pi[X ( *] = Yj* A{0, 1}" \ Range[Pi]. 


4.5 Summing Up 

From the most common case to the most general case, we have observed that for 
every lazy sampling Pi [X[] = Y{ -<—{0, 1}" \ Range[Pi], its probability to under- 
mine the freeness consistence is no more than Ylw= 

where s is the number of invocations to Pi related to this lazy sampling. Suppose 
in dealing with M 1 , M 2 , • • • , M q , we do z times lazy sampling in total, and the 
invocations to Pi related to them are .si, sy, • • • ,s z respectively. Thus, 


Pr[Cover = True] < Pr[Cover = True in lazy sampling j] 


1 (#Domain[Pi] + w — l) 2 , 
- 2 -^^ 2 n - 1 H 2 2 "” 2 

2 2n_2 


- 2 n ~ 1 
, qlma. x + q q 3 lm 
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where we apply 1 s j — qO max + 1) arid note that #Domain[Pi] is a variable 
growing from 0 to some value no larger than q(l max + 1), with lazy samplings. 

At last, we get Pr[3i e [1, q\ : Case D occurs] = Pr[Bad = True] < Pr[Zero = 
True] + Pr[Cover = True] < + ql %£.t 9 + = ql %Zt q + • □ 

5 Some Suggestions 

The key size in 3kf9 is three times of that for its underlying blockcipher, and this 
may be too large to be stored securely in some resource-restricted environments. 
For such cases, we give the following solutions: 

1. Derive a master key K <— {0, l} fc , and generate JQ = E K (Csti) (i = 1,2,3) 
with three different constants Cst*. Then we need only to store the master 
key K securely. The security of the resulting scheme is still guaranteed by 
the PRP assumption on blockcipher E. 

2. Derive K\ {0, l} fc , and generate AT* = ATi©Cstj for i = 2, 3, with two non- 
zero constants Cst 2 , Cst 3 . Then we need only to store K\ securely. However, 
this solution requires blockcipher E should be a RK-PRP (pseudorandom 
against a kind of related-key attacks) [03 . 

We warn that generating A 2 = E Kl (Cst 2 ) and K 3 = E Kl (Csts) may result 
in security flaws in 3kf9, because E k (K(B-) may not reach pseudorandomness 
given FI is a PRP |2Sj- 

3. Adopt a beyond-birthday-bound tweakable blockcipher TBC as the under- 
lying primitive in 3kf9. Then, we can replace Eki , Ek 2 and Ek 3 by TBC^ , 
TBC^ and TBC^ 3 , where T x , T 2 , T :i are three public tweaks. Such a TBC 
has recently been introduced by Landecker, Shrimpton and Terashima eh. 
but the current TBC scheme still needs key size reducing. 

Since CMAC has been widely used in practical applications HJ, someone may 
want to use CMAC (•) © CMAC k 2 (-) to get a highly secure MAC. We note that 
the precise security of this proposal is still unclear EJJJ, and it is rate-2, implying 
more power consumption and lower efficiency in serial implementations. 

6 Conclusion 

We propose a rate-1 CBC-based MAC 3kf9 with provable security beyond the 
birthday bound in this paper. 3kf9 is efficient for its rate-1 design, and highly- 
secure for its 0(^sr + ^k) PRF bound. Moreover, 3kf9 is light in the sense 
that it needs only XOR operations besides blockcipher invocations, and thus 
it immediately turns into a lightweight MAC when equipped with a lightwight 
blockcipher. However, its key size seems to be too large in some particular envi- 
ronments, requiring further improvements therefore. 
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Abstract. We develop a conceptual approach for probabilistic analysis of adap- 
tive adversaries via Maurer’s methodology of random systems (Eurocrypt’02). 
We first consider a well-known comparison theorem of Maurer according to 
which, under certain hypotheses, adaptivity does not help for achieving a certain 
event. This theorem has subsequently been misinterpreted, leading to a misrepre- 
sentation with one of Maurer’s hypotheses being omitted in various applications. 
In particular, the only proof of (a misrepresentation of) the theorem available in 
the literature contained a flaw. We clarify the theorem by pointing out a simple 
example illustrating why the hypothesis of Maurer is necessary for the compari- 
son statement to hold and provide a correct proof. Furthermore, we prove several 
technical statements applicable in more general settings where adaptivity might 
be helpful, which can be seen as the random system analogue of the game-playing 
arguments recently proved by Jetchev, Ozen and Stam (TCC’12). 


1 Introduction 

One of the key concepts in cryptographic security definitions and proofs is the notion 
of indistinguishability Q- In the information-theoretic setting, the simplest example 
is how easy it is for a computationally unbounded adversary to distinguish two ran- 
dom variables X and Y based on a single sample from either of the two variables. It 
is not hard to see that the success probability of the optimal distinguishing algorithm 
(the distinguisher’s advantage) is simply the statistical distance of the two probability 
distributions for X and Y. Yet, the analysis of current cryptographic systems typically 
requires much more than distinguishing two random variables. For instance, the related 
cryptographic primitive of a pseudo-random function allows an adversary to make mul- 
tiple queries and hence, obtain multiple related samples in order to distinguish between 
either a truly random function or a pseudo-random one. Moreover, the distinguisher can 
interact with the system by choosing the queries adaptively, i.e., based on the previous 
queries and corresponding responses. Adversarial adaptivity is notoriously difficult to 
deal with, not only in the context of pseudorandomness, but across the cryptologic land- 
scape. 

With the increasing number of sophisticated cryptographic schemes appearing in 
the literature (e.g., authenticated encryption, compression functions, message authen- 
tication codes), the level of complexity of proving even relatively straightforward se- 
curity notions such as pseudorandomness or collision resistance becomes ever more 
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involved and complicated. Even though the building blocks of the proofs rarely extend 
beyond basic notions such as conditional probabilities, Bayes’ rule or basic concepts 
from stochastic processes, combining these building blocks into a rigorous proof poses 
a challenge in many cases. Consequently, developing a more conceptual approach to- 
wards rigorous security analyses of adaptive adversaries is an important challenge in 
theoretical cryptology. 


Games and Random Systems. One of the general methods for security proofs is based 
on “game-playing” II2I81 1 611 . A common technique involves the introduction to the game 
of a flag bad (initially set to false). Th e. fundamental lemma of game playing 121 
§3.4] states that for games that are identical until bad, distinguishing between these 
games is at most as hard as setting bad to true. Several common and a few new 
techniques employed to prove preimage and collision security of compression functions 
based on ideal primitives were recently abstracted using game playing by Jetchev, Ozen 
and Stam 0 . 

A different approach to indistinguishability and probabilistic analysis of adaptive ad- 
versaries is through the concept of random systems, as introduced by Maurer m. This 
abstraction unifies many existing security proofs and it allows for proving new indistin- 
guishability results. Intuitively, a random system takes a generally unbounded sequence 
of inputs (queries) and produces an output (response) for each input using a specific 
source of randomness. Random systems are rigorously modeled in such a way that they 
exploit the input-output behavior via specifying (abstractly) a set of conditional proba- 
bility distributions (see DefinitionQ]for more details). 

A distinguisher (see Definition 0) can be thought of as another random system that 
is allowed to query either one of the two random systems and that outputs a binary 
decision bit at the end. Estimating the advantage in the case of non-adaptive adversaries 
is often much simpler than estimating the advantage for adaptive ones. Maurer gave a 
two step approach to deal with adaptive distinguishers effectively. 

First, in analogy with the fundamental lemma of game playing, it is always possible 
to rephrase the problem of upper bounding the advantage of any adversary in distin- 
guishing two arbitrary random systems into one where an adversary has to provoke 
an event instead 111 lllTil Thm.l]. Most of the indistinguishability proofs indeed follow 
along these lines. 

Next, Maurer EH Thm.2] presented a result stating that, under certain hypotheses, 
adaptivity does not help to cause an event. Throughout the paper, we often refer to this 
statement as the adaptive-non-adaptive (ANA) switching lemma (see Section B~TT) . It 
can also be used in the context of events that are meaningful in their own right, such as 
finding collisions for a hash function. 


Our Contribution. In this paper, we revisit and refine the currently existing tech- 
niques based on random systems for bounding the advantage of an adaptive adversary 
for provoking a certain event. Our contribution is twofold. On the one hand, we show 
that Maurer’s phrasing of the ANA switching lemma has been been misinterpreted, 
in the sense that an essential hypothesis has been omitted in subsequent applications. 
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This applies to the only proof given in the literature (by Pietrzak O §3.2]) which 
consequently contains an incorrect step. We restate and prove a corrected version that 
luckily works for most uses of the lemma in the literature. We explain why the original 
hypothesis is indeed necessary by providing a simple example where adaptivity does 
help, yet, where the remaining hypotheses have been satisfied. On the other hand, we 
examine existing techniques to bound the advantage of adaptive adversaries directly in 
the context of random systems. This can be seen as a generalization of the earlier work 
by Jetchev, Ozen and Stam O- 

The example is rather simple and intuitive: finding a fixed point in a uniformly 
random permutation. Here, one can easily see that adaptivity is helpful after the first 
query/response pair is obtained since (assuming that the first query has not produced a 
fixed point) an adaptive adversary can choose its second query based on the response 
to the first query and the condition that there is no fixed point yet (see Section IPi . 
Indeed, an adaptive adversary can already eliminate one choice for the second query 
(two for the third and so on), as opposed to a non-adaptive adversary who commits 
all of its queries in advance. Thus the best adaptive adversary will have a significantly 
better advantage than any non-adaptive one. Nevertheless, as we demonstrate, the hy- 
potheses of Pietrzak’s (mis)interpretation of the ANA switching lemma are satisfied, 
thus completing our counterexample. 

We proceed to examine Pietrzak’s proof of the lemma to determine what underlies 
the mistake and whether the proof can be fixed. To some extent, the problem originates 
from the elliptical notation that the theory of random systems occasionally suffers from. 
We propose a restatement of the lemma (TheoremlTH together with a correct proof. We 
then perform the important (if somewhat tedious) task of investigating known exam- 
ples in the literature where an incorrect version of the ANA switching lemma has been 
exploited (see the full version). Fortunately, to the best of our knowledge, the flaw un- 
covered by us does not lead to a violation of any security claim based on the incorrect 
ANA switching lemma (as the modified hypotheses are still satisfied). 

Our second contribution is a string of technical statements, all phrased in the lan- 
guage of random systems, that are applicable in the more general setting where adap- 
tivity might be helpful in triggering an event. The first result (Proposition 01 is the 
random system interpretation of a well-known technique, where a union bound is com- 
puted over the subevent that an adversary provokes the event at the jth step, where the 
required “stepwise” probabilities (for the subevents) are maximized in a greedy-type 
manner. This is a standard and often-used argument from security proofs that has not 
been previously linked to random systems. It makes derivation of the overall bound rel- 
atively easy. Yet, in many cases the overall upper bound is not tight enough due to the 
maximal probabilities occuring for rather unlikely query/response histories or due to 
overcounting. 

Several proofs in the literature tackle the problem of “bad” query/response history by 
the introduction of an auxiliary event explicitly bounding such a bad history occuring 
(e.g., nan). Subsequent bounding of the probabilities of on the one hand the auxiliary 
event and on the other of the actual event conditioned on the auxiliary bad event not 
occurring, leads to a tighter bound. PronositionHHgeneralizes this method in the context 
of random systems. 
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Lee et al. m recently introduced “wish lists” to the analysis of adaptive adversaries 
to limit the effect of overcounting. The idea is to cut up the analysis in two parts. First, 
one upper bounds the maximum size W of the wish list, i.e., the total number of query- 
response pairs that could ever lead to an adversarial win (to get useful bounds, one 
typically needs to introduce an auxiliary flag as in the discussion above). Next, one 
upper bounds the probability p of any particular wish to be granted, i.e., the probability 
that a query on the wish list gets to the wished for response when actually being asked 
by the adversary. Finally, one observes that in order to win, at some point an adversary 
needs to have some wish granted. Intuitively, a union bound over all wishes in the list 
means the advantage of an adaptive adversary is then at most pW. We formalize this 
approach in Proposition[0 which assumes as a hypothesis an upper bound on the sum 
of the stepwise probabilities of success for each query/response history and thus avoids 
the greedy-type argument. We refine this in Proposition [0 by adding an auxiliary flag 
event. 

Yet, the most subtle and useful (in terms of applications) bounds are provided in 
Proposition [H3 Flere, an adaptive adversary is trying to achieve a certain event more 
than once. A simple example is an adversary trying to obtain more than k fixed points 
in a random permutation, but it could also relate to a scenario where an adversary needs 
to see multiple wishes being granted. The techniques we develop here are very similar 
to those used for the analysis of a recent incidence-based compression function con- 
struction 0 - We illustrate the usefulness of our result by revisiting the analysis of an 
auxiliary collinearity event needed for the security proof of that construction (see the 
full version). The strong emphasis on conditional probabilities in the random systems 
methodology makes it very natural to express the various bounds on an adaptive adver- 
sary’s advantage, providing a different and arguably clearer perspective on the original 
proof. 


Related Work. Modification of the adversary is an important technique, orthogonal 
to our work, that is often used to bound the advantage of an adaptive adversary. In 
particular free queries have been used to great effect in the analysis of double length 
hash functions 000. A typical proof will first modify the adversary — adding the free 
queries with the somewhat paradoxic effect of taking away some of the adaptivity of the 
adversary by making it more it more powerful — followed by an analysis of the advan- 
tage of this modified adversary. For bounding the advantage of the modified adversary 
our work comes into play. 

Very recently, during their analysis of key-alternating ciphers, Bogdanov et al. Hi 
uncovered an interesting scenario where a distinguisher surprisingly benefits from adap- 
tivity. While it would be straightforward to describe their problem (and the support- 
ing counterexample) in the random systems framework and subsequently applying the 
first step of Maurer’s two step approach to move it from distinguishing to causing an 
event, the resulting event cannot be expressed as a predicate, ruling out direct applica- 
tion of many of our theorems. It is an interesting open problem to see if our approach 
can be extended to improve upon the bounds already obtained by Steinberger HI and 
Bogdanov et al. 
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2 Preliminaries 

Notation. Following the terminology and notation of imim we denote random 
variables by capital letters (e.g., X), their values by lower-case letters (e.g., x) and 
their finitqj sampling spaces by calligraphic letters (e.g., X). For a fixed sample space 
X, let X k be fc-fold Cartesian product of X. The corresponding random variables 
and their values are denoted analogously (i.e., X k and x k , respectively). For brevity, 
we use P^ [a] to denote the probability Pr[A = a] and similarly, P A ^ BC [a-,b, c] for 
Pr[A = a\B = b A C = c]. If it is clear from the context, we sometimes omit the 
specific values and simply use, e.g., P A \ BC to denote Pr[A = a\B = b A C = c]. 

Random Systems. Various cryptographic systems can be seen as random systems ifTTIl 
that are modeled as the mathematical abstraction of interactive systems: an ( X,y )- 
random system takes the inputs Xi, X2, ■ ■ ■ £ X and for each input X, it generates 
an output Yi £ y depending probabilistically on X‘ = (X) . . . . . X*) and X* -1 = 
(Yi , . . . , 1). Random systems have been used in the literature (see e.g., EHEI) to 

unify, simplify, generalize, and in some cases strengthen security proofs. 

Definition 1 (Random System). An ( X , y)-random system F is a (possibly infinite) 
sequence of conditional probability distributions Py^ xi y i _ 1 for i > 1; specifically, the 
distribution of the outputs Yi conditioned on X 1 = x l (i.e., the ith query x, and all 
previous queries x* _1 = (x\ . . . . ,Xi- 1)) and X i_1 = (i.e., all previous outputs 

2/ l_1 = (yi, • • ■ ,2/i-i)). Define 



where, for completeness, P^^iyo :=* X x = Py-^^- Two (X ,y)-random systems 
F and G are said to be equivalent (denoted by F = G) tf Py^x^*- 1 = Py^sy;- 1 
for alii > 1 and all arguments (x l ,y l ) £ X 1 X y i . 

Example 2 (Random system). Random functions and random permutations are special 
cases of random systems. If ( X,y ) is any pair of sets, a random function X — )■ y is 
a random variable whose values are functions X — > y. For any finite set X, a ran- 
dom permutation is a random variable taking values in the set of permutations of X. A 
uniformly random function R is a random function with uniform distribution over all 
functions X ^y. Using random systems, we have the following: 



1 if Xi = Xj for some j < i and y t = yj , 

0 if Xi = Xj for some j < i and y t f yj , (1) 

1/|3>| else. 


A uniformly random permutation is defined analogously. 


Most of the results and arguments in this paper generalize to infinite sampling spaces; for 
simplicity, we restrict to finite spaces as the latter are the ones relevant for cryptographic ap- 
plications. 
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Distinguishing Random Systems. In order to distinguish two {X, y)-random sys- 
tems F and G, we use the notion of a distinguisher that can be regarded as a random 
system itself. A distinguisher interacts with random systems by making queries to ei- 
ther F or G and outputs a binary decision bit after a certain number of queries. In the 
sequel, we consider information-theoretic distinguishers only; they are computation- 
ally unbounded and the only measure of complexity is the number of queries made 
by them. 

In the literature, distinguishers are classified based on how they interact with the 
random systems. For instance, adaptive distinguishers choose their ith query Xi de- 
pending on the history (i.e., all previous query-response pairs), whereas non-adaptive 
distinguishers commit all their queries in advance. Throughout, we let Ad and NAd be 
the classes of all adaptive and non-adaptive distinguishers, respectively. Definition 0 
formally introduces the concept of a distinguisher as well as its interaction with random 
systems via probability theory. 

Definition 3 (Distinguisher). An ( X , y)-distinguisher D is a (y, X)-random system 
defined by a sequence of conditional probability distributions That is, it 

is a (^ • X)-random system that is one query ahead. A ( X , y)-distinguisher D and an 
(X', y')-random system F are said to be compatible if X' = X and y' = y. 


One models the interaction of a distinguisher with a random system via a random ex- 
periment that is a sequence of conditional probability distributions. This is denoted by 
Pjv^i A ri - 1 v i - ' ant l defined simply as 


pDOF 


- P F 

- ix* 


Intuitively, this models the probabilities of the distinguisher choosing a given query Xi 
at the ith step and the random system returning a given response y* conditioned on the 
history. Moreover, we define 


pDOF _ TT pDOF 

r X*Y* LL r X i Y j \ • 

3= 1 

We are now interested in distinguishing two random systems F and G where we as- 
sume that both systems are compatible with the distinguisher D. The performance of 
D (known as the advantage of D in distinguishing F from G) is generally measured as 
follows: 

Definition 4. Let F and G be two (X, y)-random systems that are compatible with a 
distinguisher D. Given an integer i > 0, the advantage of D in distinguishing F from 
G in i queries is defined to be 

A?(F,G):=i £ |P$y*-P°?®l- 

(x i ,y i )ex i xy i 


Let C be a class of distinguishers trying to distinguish F from G. We define the advan- 
tage of the best C -distinguisher making i queries to F and G as 
Af(F,G) :=max{A?(F,G)} . 
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Random Systems with Monotone Conditions. One of the similarities between ran- 
dom systems and game-playing is a notion known as monotone condition or monotone 
event. Intuitively, it represents an event that once set, it cannot be “reset" by additional 
queries. The notion of monotone event/condition is more general and should not be con- 
fused with monotone predicate (or monotone binary output as discussed in 0 §2.3]). 

To explain the difference, let A = {a*} be the sequence of events ai, a2, 

Monotone predicates (or binary outputs) are simpler and less general since the 
query /response pairs ( x l ,y l ) at step i uniquely determine whether the corresponding 
event a* holds or not, whereas the former could be more complex (e.g., a* could be the 
(monotone) event that a certain flag is set in at most 10 steps). In other words, in the 
case of monotone predicates, the conditional probability of a, occurring conditioned on 
X 1 = x 1 A Y l = y l is binary, whereas monotone events could be more general. For 
simplicity, we assume that our monotone events are monotone predicates and consider 
a sequence of boolean predicates a* indicating whether a, holds (i.e., a* <=> a* holds; 
equivalently, -icq <=> a* does not hold) with the property that -iOj => -iaq+i (the latter 
guarantees monotonicity). 

As an example, consider the monotone event a* that after the ith query to a uniformly 
random function, all distinct inputs result in distinct outputs (i.e., there exists no output 
collisions). It is not difficult to see that A = {a, } is a monotone binary output as -io» =>■ 
-idj+i and ai is completely determined from (x l .y l ). Equivalently, if there is an output 
collision for the ith step, there is also an output collision for all the subsequent steps. 
The monotonicity condition gives rise to a sequence of binary probabilities P^.^i Y * e 
{0, 1} with the property that 

Vi > 1, Pa«| x i Y i = 1 PLi|*^v*4 ; 1 ■ (2) 

Associated to a random system with a monotone binary output, we have the following 
data: 

- D.O (data defining F): these are simply the probability distributions P^^v 4 - 1 ’ 

- D.l (binary probabilities for A): these are the binary probabilities P^. xiyi (de- 
scribing the predicates ai and -^a,j satisfying (0). 

Remark 5. In the case of monotone conditions, the defining probabilities P„. |xiy< can 
be arbitrary real numbers in the interval [0,1]. 

We can derive various other probabilities using conditional probabilities/Bayes’ rule as 
well as D.O and D.l: 

Event Probabilities for A: These are the probabilities denoted by P^| ch-iX^y*- 1 ' 
Intuitively, P^ | a . 1 x i Y i ~ 1 m °dels the probability of the predicate a,; conditioned on 
the query/response history, as well as on the predicate a*_ i. We derive it from D.O and 
D.l as follows: 

p f _ pF pF 

Vi 
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Here, one can also derive the probability distributions P^ 0 .| a . 1 x i Y i ~ 1 s ' m P'y as 1 — 
P^.| a . lX i Y i- 1- It is important to note that if the condition evaluates to 

false for all yi for a given (a;*, t/ z_1 ), this probability is set to zero (for reasons that 
will become clear later). We remark that in a similar manner, one can adjoin yet another 
monotone condition B to a random system with a monotone condition A. 

A Random System Conditioned on A not Failing (denoted by F|^4): These are prob- 
ability distributions Pyio^Y 4 - 1 an d can * 3e derived from Bayes’ rule as follows: 


p f p f _ p f 

r Yi\X*Y*- ir ai\X*Y* ~ V ai 


— pF 

y*“i “ r Yi\aiX 


where the middle term (which has not been defined yet) is a formal symbol for the 
corresponding probability. Assuming that P^. i | Xi _i r ,_i = 1 together with the mono- 
tonicity of A, we see that P^ t \ X i Y i - 1 ~ P^.| a . 1 x i Y i ~ 1 ^ 0- O ne can thus derive the 
conditional probabilities 


Intuitively, this looks like a random system except that we have conditioned on the pred- 
icate a*. Note that this need not be a probability distribution: for instance, consider the 
example of a random function R : {0. 1}” — > {0, 1}" and define the a, as the event of 
having a collision between an input and an output. It might occur that X2 = !J\ in which 
case a2X 2 Y 1 will always evaluate to false and thus, the probability P^ 2 |a 2 v' 2 Y 1 = 0 
for all z/2, so it will not represent a well-defined distribution on the variable Y% • In cases 
when this degeneracy does not occur, we can consider F|*4 as a true random system G 
(see Hypothesis 0, denoted F|„4 = G. Note that this particular notion of equivalence 
of a random system and a random system with a monotone condition can be extended 
slightly in the case of degeneracies too. As described in EH Defn.6], we say that F|A 
is equivalent to G if Py^^Y 4 - 1 = ^Y i \x i Y i - [ I° r an y i and any values of the pa- 
rameters for which Py^^Y 4 - 1 ' s not identically zero (i.e.., is a distribution). Finally, 
we note that F|„4 appeared in, e.g., El Defn.7]. 

A Random System with a Condition A (denoted by F- 4 ): This is the random system 
corresponding to irra Defn.6]) and can be derived by 

pF pF pF 

r a i Y i \a i - 1 X i Y i - 1 r Y i \a i X i Y i - ir a i \a i - 1 X i Y i - 1 ■ 

We also define 

P a*Y 4 |X 4 : = II P* jYi \ aj.iXJYi-i ■ 

3 = 1 


Moreover, we consider distinguishers trying to provoke the negated event -la* again 
via a sequence of probability distributions. To indicate the link with eq, we denote 
these distributions by P^| o . iXi -i Y ,-, ■ As in the case of true random systems, this 
models the probability distribution of an adversary choosing the 7th query based on the 


Understanding Adaptivity: Random Systems Revisited 321 


previous responses and the predicate dj_i (meaning that the desired event -*$_i has 
not occurred after the ( i — l)st query/response pair). 

Using this data, we can derive various probabilities and distributions by imposing Bayes’ 
rule. We define the probabilities for the random experiment D<>F by 

pDOF nF pD 

Intuitively, this models the probability of choosing a particular query, obtaining a par- 
ticular response and the predicate a* (resp., ~>ai) conditioned on the history and the 
predicate a,;_i . Finally, let 


Cr'v* : = IT p 


DOF 


Similarly, we define an expression for -la*. We are now ready to define the advantage 
of the distinguisher (adversary) D in provoking the desired event — ia»: 

Definition 6. LetC be a class of distinguishers D that are trying to provoke -ia*. Given 
i > 0, define f d (F, -id*) to be the advantage of the distinguisher D in provoking the 
event -la* in the random experiment DOF. That is 

•' D (r .-»<)= £ 

Furthermore, for all i > 1, define v c (F, -i af) := max u° (F, -i af) to be the maximum 
advantage over all distinguishers in the class C trying to provoke -ia 

Finally, we explain the analogue (in the context of random systems) of the fundamental 
lemma of game-playing and comment on why the random-system statement is more 
general. Suppose that F is a random system with a monotone condition A and let G be 
another random system. The analogue of the hypothesis of the fundamental lemma of 
game-playing (that two games are equivalent up to statements that are evaluated only 
if a* is set to true) is simply F|^4 = G. Under that hypothesis, we expect that one 
can bound the distinguishing advantage AP(F, G) via the advantage ^ D (F, -i af) of 
an adversary to provoke ->a*. Interestingly enough, one can deduce the latter from a 
weaker hypothesis, namely the hypothesis that 

f%Yl\X* < Vj < i. 

The following lemma is proven in m. Lem.6] (see also m Thm.l]): 

Lemma 7. Assume that Pa jY j\xi — bolds for all j < i. Then for any distin- 

guisher D, 

Af(F,G) < (F, -iaj) . 

In the following sections, we develop techniques to upper bound (F, -ia*). 
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3 A Standard Method for Probabilistic Analysis of Adaptive 
Adversaries 


Let A be a monotone condition and let F be an (X. y)-random system. Our goal is to 
compute an upper bound for v M (F. -la*). The standard way to deal with the overall 
probability of setting -*ii is to bound it by a sum (over j < i) of the maximum (over all 
adversaries) probability of winning at the jth step, where these “stepwise" probabilities 
are only taken over the probability distributions describing F. In other words, for each j, 
we maximize individually the probability of winning at the jth step assuming that we 
have not won at step j — 1. This greedy-type approach for producing an upper bound 
can be formalized in Propositional (see Appendix of the full version for its proof). We 
first state an hypothesis that is commonly used throughout the paper. 

Hypothesis 8. Let F be an (X, -random system and let A be a monotone condition 
on F. There exists an (X, ^-random system G such that F|A = G, i.e., for all i > 1 

and all {x^y*) e X i x y\ 

pF _ pG 

r Y ii a i X‘Y' 1 ~ r Y i \X i Y i ~ 1 ' 

Proposition 9. Let F be an ( X , y)-random system and let Abe a monotone condition 
on F. Assuming that max -(xJ,yi- 1) |P^ aj | raj } < 1. we have 


Vd (F, -a*) < Y ( max i} {P^| aj _ lX ,yi-i } • 


4 When Adaptivity Does Not Help 

4.1 Revisiting the Result of Maurer and Pietrzak 

Maurer m and Pietrzak provide a general method for proving that under certain 
hypotheses, adaptive strategies are no better than non-adaptive ones in forcing a condi- 
tion to fail. In other words, if these hypotheses are satisfied, the advantage of the best 
Ad- and N Ad-distinguisher are equal. Here, we show that the hypothesis (Hypothesis^) 
used by Pietrzak is not sufficient for the comparison result of 111 1 HI .311 (ANA switching 
lemma) to hold by providing a particular counterexample in Proposition fTTH where the 
hypothesis is clearly satisfied and where adaptivity does help. We then explain the prob- 
lem in the ANA switching lemma in detail and suggest different ways to remedy it in 
Section EOl The following statement appears in lfT3l Lem. 6]: 


Adaptive-Non-Adaptive (ANA) Switching Lemma. Let A be a monotone condition 
and let F be an ( X , y>)-random system. If Hypothesis 0 holds for F, A and an (X, y)- 
random system G, then adaptivity does not help in provoking -iOj. More precisely, 

F Ad (F ,-.Oi) = v nm (F 
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Now, we present an example of a random system where adaptive adversaries have better 
advantage than the adaptive ones in provoking a welldefined monotone event. 

Proposition 10. Let X = {0, 1}” and let P : X — > X be a uniformly random permu- 
tation. Let a, be the event that y.j f xj for all j < i where yj = P (xj). Then A is 
monotone and 

o M (P,^ai) >u NM (P,^ ai ) . 

Proof The sequence of predicates {a,;} is monotone by definition. We calculate the 
probability of obtaining a fixed point for P after at most two queries; the case for general 
i follows by inspection. After querying P with any X\ = X \ G {0, 1}”, the response 
2 /i e {0, 1}" is uniformly random. Thus, with probability 1/2” a fixed point is found 
after the first query. Hence, 

P p [Y 2 = X 2 vY 1 = xi] = P p [F 2 = x 2 A Yl ± si] + P p [Yi = an] = 

= P p [F 2 =x 2A y 1 ^xi] + l/2". 

The distinction between an adaptive and a non-adaptive strategy shows up after the sec- 
ond query: the latter commits the second query in advance whereas the former chooses 
it adaptively based on the first query and its response. 

Case 1: Non-adaptive adversary. If the adversary were non-adaptive, she would have 
fixed x 2 f X\ prior to obtaining the response y-\ and since P(x 2 ) f P(an) and P is a 
uniformly random permutation, P(x 2 ) e {0, 1}” — {y{\. Note however that if x 2 = y \ , 
no y 2 could lead to a fixed point. Hence (by Bayes’ rule), 

pP[F 2 = x 2 AFi + xi] = P p [y 2 = x 2 I Yx + Xi,x 2 ]P p [y + x 1; x 2 ] . 

Clearly, P p [Yi f Xi,x 2 ] = (2” — 2)/2 n . Moreover, y 2 is uniformly random among 
{0, 1}" - {*/i>, so 

P p [y 2 = x 2 AFi + Xi] = => P p [y 2 = x 2 VYi = Xi] = 

1 2” — 2 1 1 
“ 2 n - 1 ' 2 n + 2P < 2 n ~ 1 ' 

Since the above analysis holds for any non-adaptive adversary, we conclude that 
^ NAd (P,^o 2 ) < 1/2"" 1 . 

Case 2: Adaptive adversary. Knowing x\ . y\ and y\ f xi from the first query, an 
adaptive adversary can eliminate one choice for the second query x 2 different from x\, 
namely x 2 = y\ . Thus, a clever adversary will choose x 2 e {0, 1}" — {xi, y{\ so that 
the chance of finding a fixed point after the second step is 1/(2” — 1). Thus, 

p p [y 2 = x 2 a n ± xi] = p p [y 2 = x 2 1 n ^xi a yi ^ x 2 ] p p [ y 1 ^ Xi a y, ^x 2 ] = 
1 2” - 1 
f 2 ” - 1 ' 2 ” ’ 
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and we conclude that 



□ 


We now explain why Hypothesis 0 holds for the monotone event sequence A and the 
random system P. 

Proposition 11. Let P and A be as in Proposition [TtJ Then, Hypothesis 0 holds using 
the monotone condition A, along with taking P as F. 

Proof. Let i = 2. We simply need to define the distributions (i) Pp^i f° r all 2/1 G y 
and x 1 G X 1 , and (ii) Py-yx^Y 1 for all y 2 Gy, x 2 G X 2 and y 1 Gy 1 . For (i), define 




0 otherwise. 


Clearly, Py 1 | ai xi = Pp^x 1 - P° r 00 , assuming y\ f x\ and X\ f x 2 , define the 
distribution in the following two cases: 

Case 1: x 2 = y\. There are 2 n — 1 possible values for y 2 = P(x 2 ) occurring with equal 
probabilities and none of these values can lead to a fixed point, so we have 


{' 


0 if 2/2 = 2/i = x 2 , 
2 „ 1 _- l otherwise . 



Case 2: x 2 f yi- Here, the case of y- 2 = x 2 f y\ causes ~>a 2 , so one can define: 


{' 


0 if 2/2 = 2/1 or y 2 =x 2 ^ yx 
, 2 n_ 2 otherwise . 



We easily verify that in all cases, P^ 


- P G 

— r Yo 


□ 


Y 2 \a 2 X^Y 1 


Y^XiY 1 ' 


4.2 Another Look at the Comparison of Adaptive vs. Non-adaptive Adversaries 

Propositions [El and [HI show that the ANA switching lemma cannot hold as stated 
in irra Lem.6]. We now analyze in detail the proof of the ANA switching lemma given 
in IBI . identify the step that causes the discrepancy and propose a fix. 


The Mistake in the Original Proof iTTsl . The ANA switching lemma first appears 
in nu Thm.2] with the correct hypothesis (see (1) of loc. cit.), but without a proof. A 
slightly different version referring to the original claim is given in ITT21 Prop. 2] (again 
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without a proof). The only proof, to the best of our knowledge, appears in to Lem.6] 
and is based on a chain of equalities and inequalities starting with 


1 — ^ Ad (F,-icij) 


e n^v 



Similarly to Proposition^ the proof is based on applying Bayes’ rule to P^yfxiYJ- 1 ■ 
The application of the Bayes’ rule in eh Lem.6] is, however, incorrecfl. The correct 
application yields (assuming that the conditional distributions are well-defined) 

pF _ pF pF 

r Y j a j \XiY0~ 1 — r Y j \a j X3Yi- ir a :i \X3Yi- 1 • 

The problem is that the term P F | Xi = n}=i P„ | xjyj- 1 ' s assume d to be independent 
of y* _1 (see the top line of El P-30] - step (2.26)). There is no reason why (for a fixed 
x l ) this term should be independent of Y i_1 ; yet, this is used implicitly in the argument. 
We have seen in Proposition ITTH that the probability P ^\x 2 y^ depends on Y 1 , so the 
ANA switching lemma does not apply. 


Strengthening the Hypotheses. We now propose a simple fix to the ANA switch- 
ing lemma by adding an extra hypothesis, essentially stating that the probability of 
achieving a success on the jth query is independent of the answers to all the previous 
queries. This statement (albeit in a different formulation) already appears as (1) in Mau- 
rer’s original EH Thm.2], as well as a rephrased reproduction iTT^l Prop. 2], Neither of 
these statements comes with a proof and both omit mention of Hypothesis 0 although 
in EH Thm.2] an alternative condition (2) is given such that (2) is claimed to imply 
both (1) and Hypothesis 0 

Our proof of Theorem 03 follows largely along the lines of the (incorrect) proof 
of Pietrzak, but obviously with fixes applied where necessary. Here, Hypothesis |HI is 
needed to guarantee that all conditional probabilities P F ^ ajX^YJ- 1 are well-defined and 
are also distributions when considered as functions on yj G y. The second hypothesis 
simply says that if there is no dependency of the conditionals P F [aj[a,'_i A X 3 = 
x 3 A Y 3 - 1 = y 3 ~ l ] on the previous outputs then adaptivity should not help at all. 

Theorem 12. Let F be an (X , y)-random system and let Abe a monotone condition 
on F. Let i > 0 be an integer. Suppose that Hypothesis |8| holds for F and A. If, in 
addition, for every j < i and x 3 G X 3 , P F [aj|a ? _i A X 3 = x 3 A Y 3 ~ l = y 3 ~ l ] is 
independent of y 3 ~ 1 G y 3 ~ [ , then adaptivity does not help in provoking —*ai, i.e., 

o m (F, -<ai) — t/ NAd (F, -id*). 

Proof of Theorem El We first note that F Ad (F, ->ai) > F NAd (F, -■ af) holds. The rest 
of the proof follows by showing the other direction of the inequality; we have that 
1 - v m (F ,-.Oi) equals 

2 Furthermore, the argument in o Lem.6] does not state whether the conditional probabilities 

Py \a XiY 3 ~ 1 are well-defined, for all j < i. 
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(**,«*) y-* 

(=> DeAd| 2 - [i f P v,|.VA,'v-i p !', „ ! _ l .\- J pl v 1 |„ 1 _ 1 v,-i v .'-' j | 

= ^Adjx! [ri P djkd-iXdj 2Z [ri^l^vJ-dPxjiod-dJfd-'yd-'j | 

=s[i;[n p ikd-.»j| 

£ DgAd {5 (n P ' 1 |.d-.» P S 1 |.,-, j | = S {s P Sx'} 


> = (1 — z/ NAd (F, ->a,)) . 

Here, (*) uses Hypothesis^ as well as the extra hypothesis that P^.| xiYi- 1 ' s indepen- 
dent of y J_1 . Hence, z/ NAd (F. ~<ai)) > ;z Ad (F, -ia*}) and the claim follows. □ 


5 Towards Obtaining Better Bounds 

5.1 Using an Auxiliary Flag 

The standard approach given in Section 0 has the disadvantage that for more complex 
constructions, the maximal probabilities can get too large. This is often due to the fact 
that the maximum is achieved for rather degenerate values of ( x l , y l ) that occur with 
very low probability. Assuming that one can bound the probability of the degeneracy, 
one way to refine the analysis of the adaptive adversary is to introduce an auxiliary 
event (flag) that is set only for non-degenerate pairs {x l . y l ). More precisely, if a, is 
the monotone event to be studied, we introduce a flag event b, (together with a corre- 
sponding predicate bi indicating whether b, has occurred or not) and we use the fact 
that 

-•a* <^> (-iaj A bj) V (-la, A -ibj) =*> (-la* A bj) V -ibj. 

Now, bounding the advantage of achieving -ia, amounts to bounding the advantage of 
achieving -ia, A b, together with bounding the probability of degeneracy (or, of — ib, ). 
The latter can be done via Proposition 0 yet for the former we need to introduce new 
definitions. 

All this can be rigorously modeled using random systems as follows: suppose that 
F is a random system with a monotone condition B (here, B represents the flag event). 
Suppose further that ~F\B is equivalent to another random system G (i.e., F| B = G). 
Now, we simply impose a monotone condition A on G. Equivalently, we need to specify 
the corresponding probabilities and distributions from Section 0 Suppose that we are 
given the following data: 
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- Event probabilities P G | a . a ^ so denoted by P^.| a . 1 b i x i Y i - 1 ( to indicate 

better what they are supposed to model), 

- The random system G|*4, namely, probabilities Py- that we also denote 

by ^YilaibiX^y 4 - 1 ’ 

- Distinguisher relative to A, namely, probabihty distributions denoted by 
P D 

r x 4 |o < _i6 j _iX‘-iy‘-»- 

This data allows us to upper bound the advantaged (F, -lajAbj) (by defining 
P?d_ G ) following exactly the same steps as in Sectional (for the random system G and 
the monotone event A). Moreover, we assume all the corresponding notation. The fol- 
lowing proposition provides an upper bound on the adaptive advantage (see Appendix 
of the full version for its proof): 

Proposition 13. Let F be a random system with a monotone condition B with the prop- 
erty that there exists a random system G such that F| B = G. Let Abe a monotone 
condition on G. Assuming that 




kd (F, -, ai A bi)<^2 (j max i) {P^| ai _ l6 , xw -* } • 


5.2 Improving the Bounds Obtained from Step-Specific Maximization 

The greedy approach based on step-specific maximization often has limitations in the 
sense that the produced bounds are not tight enough. One can obtain better bounds 
via the simple observation that the advantage of an adversary in provoking -m, for a 
monotone event A can be bounded by the sum of the event probabilities for the negated 
events -ia j for j < i that are part of the data defining the monotone condition A. Conse- 
quently, if one is able to provide upper bounds on these sums, one would automatically 
obtain an upper bound on the adaptive advantage. 

In order to carry out this idea rigorously, we consider two methods that are formally 
stated in Propositions [0 and [21 (see Appendix of the full version for the proof of the 
former; the proof of the latter follows from the proof of Propositions ITU and FBI). We 
first give ourselves an upper bound Bx on the sum of the event probabilities and then 
show that the same Bs bounds the adaptive advantage as well. The second method 
is a variation of the first where one uses an auxiliary event. These two techniques are 
important whenever the bounds given in Pronositionsl^laridlTTlare not sufficiently tight). 
A good example of that is the analysis an adaptive adversary trying to achieve a collision 
in the compression function of Q (see Appendix of the full version for the details). 
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Proposition 14. Let F be a random system with a monotone event A. If there exists a 
value B s G (0, 1) such that for all (x z , y l ) G X 1 x y 1 

3 = 1 

then F Ad (F, -■ af) < B s . 

The following proposition shows the natural generalization of the above proposition to 
the case of auxiliary events (its proof follows from the proof of ProDositionsHHandH?ll: 
Proposition 15. Let F be a random system with a monotone condition B with the 
property that there exists a random system G such that F| B = G. Let Abe a mono- 
tone condition on G. Suppose that there exists a value Be G (0, 1) such that for all 
(x i ,y i )GX i xy i 


=e p ?. 

Then i/ Ad (F, -iOj A bf) < Be- 


Counting Successes. In Proposition [0 we are mainly interested in estimating the 
maximal probability of the event (success) occurring once. Nevertheless, in some cases 
the major monotone event A might depend on an auxiliary condition that intrinsically 
requires an event (success) to occur more than once. As a simple example, consider 
a generalization of the case studied in Proposition [HJ let P be a uniformly random 
permutation P : X — X for X = {0, l} n and let -ia* be the event that y 3 = Xj for 
more than k values of j < i where yj = P (xj ) and k is a positive integer. More 
precisely, a* is the predicate that there exist at most k fixed points after the ith query. 

Such a general problem can be modeled and studied using random systems as fol- 
lows: suppose that F is an (X, -random system. We then attach an event called hit,; to 
the random system F - this is the success event at step i. Note that hit* is not monotone. 
Moreover, we introduce a random variable ctr, ; to indicate the number of successes up 
to step i. In other words, ctro = 0 and for every j > 1, ctr, = ctr ? _-| + 1 if hitj occurs 
and ctrj = ctr 3 _i otherwise. Finally, we can associate monotone events A K = {a^,} 
for every integer k > 0, so that a K ,, : is event that there are at most k successes after the 
ith query. In other words, a K y is the event that ctr, < k. 

In order to attach the success event to the random system, we provide the following 
additional data to D.O: 

H.l: Binary probabilities P^ t4 |jsfiy-i for every x' G X 1 and y l G y i . 

We can derive the following probabilities from D.O and H.l via Bayes’ rule: 

- Probabilities P^ t .| X4yi _i for every x l G X 1 and y l ~ x G y i_1 defined by 

pF _ A"' pF p f 

r hiti| XiYi- 1 ~ / . r hit.-IX»y f r YAX i Y^~ 

Vi 

- The data for each of the monotone events Ak,. 
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Propositional sets an upper bound on z/ Ad (F, -^a n . K ) (see Appendix of the full version 
for its proof). 

Proposition 16. Let k be a non-negative integer and suppose that there exists a value 
Be 6 (0, 1) such that for all (a:\y*) G X 1 x y i , 

iz < B z and Pj^i xiY t-x > o. 

3=0 

Thenv M { F,->a itK ) < B% +1 . 

Remark 1 7. We should indicate the analogy between PronositionnHand PronositionH^I 
with Q Prop. 7] and 0 Prop. 9], respectively. We believe that having such statements 
and techniques developed in the general context of random systems could serve as a 
guiding tool for more conceptual security proofs for other constructions in the future. 
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Abstract. We provide a framework enabling the construction of IBE 
schemes that are secure under related-key attacks (RKAs). Specific in- 
stantiations of the framework yield RKA-secure IBE schemes for sets of 
related key derivation functions that are non-linear, thus overcoming a 
current barrier in RKA security. In particular, we obtain IBE schemes 
that are RKA secure for sets consisting of all affine functions and all poly- 
nomial functions of bounded degree. Based on this we obtain the first 
constructions of RKA-secure schemes for the same sets for the following 
primitives: CCA-secure public-key encryption, CCA-secure symmetric 
encryption and Signatures. All our results are in the standard model 
and hold under reasonable hardness assumptions. 


1 Introduction 

Related-key attacks (RKAs) were first conceived as tools for the cryptanalysis 
of blockciphers [22l9j . However, the ability of attackers to modify keys stored 
in memory via tampering jl Bit ()l raises concerns that RKAs can actually be 
mounted in practice. The key could be an IBE master key, a signing key of a 
certificate authority, or a decryption key, making RKA security important for a 
wide variety of primitives. 

Provably achieving security against RKAs, however, has proven extremely 
challenging. This paper aims to advance the theory with new feasibility results 
showing achievability of security under richer classes of attacks than previously 
known across a variety of primitives. 

Contributions in brief. The primitive we target in this paper is IBE. RKA se- 
curity for this primitive was defined by Bellare, Cash, and Miller jjj . As per the 
founding theoretical treatment of RKAs by Bellare and Kohno |5j , the definition 
is parameterized by the class $ of functions that the adversary is allowed to ap- 
ply to the target key. (With no restrictions, security is unachievable.) For future 
reference we define a few relevant classes of functions over the space S of master 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 331- 15351 2012. 

(c) International Association for Cryptologic Research 2012 


332 M. Bellare, K.G. Paterson, and S. Thomson 


keys. The set A c = {(j> c } c es with <p c (s) = c is the set of constant functions. If S is 
a group under an operation * then A 1 '" = {(j) a }o.eS with 4> a (s) = a * s is the class 
of linear functions. (Here * could be multiplication or addition.) If <S is a field we 
let <£ aff = {(f)a,b}a,bes with (j) a ,b{s ) = as + b be the class of affine functions and 
^poiy(d) _ {cj) q \ qeSd ^ with (j) q {s) = q(s ) the class of polynomial functions, where 
q ranges over the set Sd [x] of polynomials over S of degree at most d. RKA security 
increases and is a more ambitious target as we move from A l,rl to A aff to <f;P°iy(d) 

The choice of IBE as a primitive is not arbitrary. First, IBE is seeing a lot of 
deployment, and compromise of the master secret key would cause widespread 
damage, so we are well motivated to protect it against side-channel attacks. 
Second, IBE was shown in 0 to be an enabling primitive in the RKA domain: 
achieving RKA-secure IBE for any class A immediately yields < ARKA-secure 
CCA-PKE (CCA-secure public-key encryption) and Sig (signature) schemes. 
These results were obtained by noting that the CHK [E] IBE-to-CCA-PKE 
transform and the Naor IBE-to-Sig transform both preserve RKA security. Thus, 
results for IBE would immediately have wide impact. 

We begin by presenting attacks showing that existing IBE schemes such as 
those of Boneh- Franklin m and Waters are not RKA secure, even for A 1 ” 1 . 
This means we must seek new designs. 

We present a framework for constructing RKA-secure IBE schemes. It is an 
adaptation of the framework of Bellare and Cash j3j that builds RKA-secure 
PRFs based on key-malleable PRFs and fingerprinting. Our framework has two 
corresponding components. First, we require a starting IBE scheme that has a 
key-malleability property relative to our target class A of related-key deriving 
functions. Second, we require the IBE scheme to support what we call collision- 
resistant identity renaming. We provide a simple and efficient way to transform 
any IBE scheme with these properties into one that is A-RKA secure. 

To exploit the framework, we must find key-malleable IBE schemes. Some- 
what paradoxically, we show that the very attack strategies that broke the RKA 
security of existing IBE schemes can be used to show that these schemes are 
^-key-malleable, not just for $ = <P hn but even for ( P = A aff . We additionally 
show that these schemes support efficient collision-resistant identity renaming. 
As a consequence we obtain A aff -RKA-secure IBE schemes based on the same 
assumptions used to prove standard IBE security of the base IBE schemes. 

From the practical perspective, the attraction of these results is that our 
schemes modify the known ones in a very small and local way limited only 
to the way identities are hashed. They thus not only preserve the efficiency of 
the base schemes, but implementing them would require minimal and modular 
software changes, so that non-trivial RKA security may be added without much 
increase in cost. From the theoretical perspective, the step of importance here 
is to be able to achieve RKA security for non-linear functions, and this without 
extra computational assumptions. As we will see below, linear RKAs, meaning 
<A m -RKA security, has so far been a barrier for most primitives. 

However, we can go further, providing a rJ> poly hb-RKA-secure IBE scheme. 
Our scheme is an extension of Waters’ scheme EE3- The proof is under a q- 
type hardness assumption that we show holds in the generic group model. The 
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IBE 
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Sig 

0+0 
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/ 

CCA-PKE 

122, 0+0 

/ 

/ 

CPA-SE 

0, 0+0 

ED 

ED 

CCA-SE 

0+0 


/* 

PRF 

0 




Fig. 1 . Rows are indexed by primitives. Columns are indexed by the class $ of related- 
key derivation functions, <P hn , ^ afF and <£ poly W respectively. Entries indicate work 
achieving #-RKA security for the primitive in question. Checkmarks indicate results 
from this paper that bring many primitives all the way to security under polynomial 
RKAs in one step. The table only considers achieving the strong, adaptive notions of 
security from non-adaptively secure signature schemes for non-linear RKAs were 
provided in m Note that symmetric key primitives cannot be RKA secure against 
constant RKD functions, so affine and polynomial RKA security for the last three rows 
is with respect to the RKD sets <£ aff \<T and <2> poly(rf) \W. The in the CCA-SE row is 
because our CCA-SE construction is insecure against RKD functions where the linear 
coefficient is zero, so does not achieve RKA security against the full set <£ poly W \ (J> c . 
See the full version for details. 


significance of this result is to show that for IBE we can go well beyond linear 
RKAs, something not known for PRFs. 

As indicated above, we immediately get <ARKA-secure CCA-PKE and Sig 
schemes for any class $ for which we obtained 'ARKA-secure IBE schemes, 
and under the same assumptions. When the base IBE scheme has a further 
malleability property, the CCA-PKE scheme so obtained can be converted into a 
$- ILK A- secure CCA-SE (CCA-secure symmetric encryption) scheme. This yields 
the first RKA secure schemes for the primitives Sig, CCA-PKE, and CCA-SE 
for non-linear RKAs, meaning beyond <£ lin . 

Background and context. The theoretical foundations of RKA security were 
laid by Bellare and Kohno 0, who treated the case of PRFs and PRPs. Re- 
search then expanded to consider other primitives |2()I2I2 fill . In particular, Bel- 
lare, Cash and Miller 0 provide a comprehensive treatment including strong 
definitions for many primitives and ways to transfer <ARKA security from one 
primitive to another. 

RKA-security is finding applications beyond providing protection against 
tampering-based sidechannel attacks j I b| . including instantiating random ora- 
cles in higher-level protocols and improving efficiency 12m 

With regard to achieving security, early efforts were able to find PRFs with 
proven RKA security only for limited $ or under very strong assumptions. Even- 
tually, using new techniques, Bellare and Cash j2j were able to present DDH- 
based PRFs secure against linear RKAs ( ( P = ( P Un ). But it is not clear how to 
take their techniques further to handle larger RKA sets f P. 
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Fig- 0 summarizes the broad position. Primitives for which efforts have now 
been made to achieve RKA security include CPA-SE (CPA secure symmetric 
encryption), CCA-SE (CCA secure symmetric encryption), CCA-PKE (CCA 
secure public-key encryptioiQ) Sig (Signatures), and IBE (CPA secure identity- 
based encryption). Schemes proven secure under a variety of assumptions have 
been provided. But the salient fact that stands out is that prior to our work, 
results were all for linear RKAs with the one exception of CPA-SE where a 
scheme secure against polynomial (and thus affine) RKAs was provided by |2J . 

In more detail, Bellare, Cash and Miller 0 show how to transfer RKA se- 
curity from PRF to any other primitive, assuming an existing standard-secure 
instance of the primitive. Combining this with [Sj yields DDH-based schemes 
secure against linear RKAs for all the primitives, indicated by a “0 + |3|” table 
entry. Applebaum, Harnik and Ishai 0 present LPN and LWE-based CPA- 
SE schemes secure against linear RKAs. Wee m presents CCA-PKE secure 
schemes for linear RKAs. Goyal, O’Neill and Rao m gave a CPA-SE scheme 
secure against polynomial RKAs. (We note that their result statement should 
be amended to exclude constant RKD functions, for no symmetric primitive can 
be secure under these.) Wee j2S| (based on a communication of Wichs) remarks 
that AMD codes [E| may be used to achieve RKA security for CCA-PKE, a 
method that extends to other primitives including IBE (but not PRF), but with 
current constructions of these codes [TH| , the results continue to be restricted to 
linear RKAs. We note that we are interested in the stronger, adaptive versions 
of the definitions as given in 0, but non-adaptively secure signature schemes 
for non-linear RKAs were provided in Ell- 
in summary, a basic theoretical question that emerges is how to go beyond 
linear RKAs. A concrete target here is to bring other primitives to parity with 
CPA-SE by achieving security for affine and polynomial RKAs. Ideally, we would 
like approaches that are general, meaning each primitive does not have to be 
treated separately. As discussed above, we are able to reach these goals with 
IBE as a starting point. 

A closer look. Informally, key-malleability means that user-level private keys 
obtained by running the IBE scheme’s key derivation algorithm 1C using a mod- 
ified master secret key <f>(s) (where 6 G 0 and s e S. the space of master secret 
keys) can alternatively be computed by running 1C using the original master se- 
cret key s, followed by a suitable transformation. A collision-resistant identity 
renaming transform maps identities from the to-be-constructed RKA-secure IBE 
scheme back into identities in the starting IBE scheme in such a way as to “sep- 
arate” the sets of identities coming from different values of (j>(s). By modifying 
the starting IBE scheme to use renamed identities instead of the original ones, 
we obtain a means to handle otherwise difficult key extraction queries in the 
RKA setting. 


1 RKAs are interesting for symmetric encryption already in the CPA case because 
encryption depends on the secret key, but for public-key encryption they are only 
interesting for the CCA case because encryption does not depend on the secret key. 
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To show that the framework is applicable to the Boneh-Franklin m and 
Waters IBE schemes with $ = d> aff (the space of master keys here is Z p ), we 
exploit specific algebraic properties of the starting IBE schemes. In the Waters 
case, we obtain an efficient, # aff -RKA-secure IBE scheme in the standard model, 
under the Decisional Bilinear Diffie-Hellman (DBDH) assumption. In the Boneh- 
Franklin case, we obtain an efficient, <?> aff -RKA-secure IBE scheme under the 
Bilinear Diffie-Hellman (BDH) assumption with more compact public keys at 
the expense of working in the Random Oracle Model. Going further, we exhibit 
a simple modification of the Waters scheme which allows us to handle related key 
attacks for $ poly ( rf ) f this being the set of polynomial functions of bounded degree 
d. This requires the inclusion of an extra 2d — 2 elements in the master public 
key, and a modified, g-type hardness assumption. We show that this assumption 
holds in the generic group model. 

Applying the results of 0 to these IBE schemes, we obtain the first construc- 
tions of RKA-secure CCA-PKE and signature schemes for A aff and <£>p° ly ( d ). 
Again, our schemes are efficient and our results hold in the standard model un- 
der reasonable hardness assumptions. The CCA-PKE schemes, being derived via 
the CHK transform na, just involve the addition of a one-time signature and 
verification key to the IBE ciphertexts and so incur little additional overhead for 
RKA security. As an auxiliary result that improves on the corresponding result 
of 0 , we show in the full version jSj that the more efficient MAC-based transform 
of can be used in place of the CHK transform. The signature schemes 

arise from the Naor trick, wherein identities are mapped to messages, IBE user 
private keys are used as signatures, and a trial encryption and decryption on a 
random plaintext are used to verify the correctness of a signature. This generic 
construction can often be improved by tweaking the verification procedure, and 
the same is true here: for example, for the Waters-based signature scheme, we 
can base security on the CDH assumption instead of DBDH, and can achieve 
more efficient verification. We stress that our signature schemes are provably 
unforgeable in a fully adaptive related-key setting, in contrast to the recently 
proposed signatures in Ell- 

Note that RKA-secure PRFs for sets A aff and ^P° ly (<i) cannot exist, since these 
sets contain constant functions, and we know that no PRF can be RKA-secure in 
this case |5j . Thus we are able to show stronger results for IBE, CCA-PKE and 
Sig than are possible for PRF. Also, although Bellare, Cash and Miller j3j showed 
that <£-RKA security for PRF implies A-IIKA security for Sig and CCA-PKE, 
the observation just made means we cannot use this result to get A aff or <gp° ly w 
RKA-secure IBE, CCA-PKE or Sig schemes. This provides further motivation 
for starting from RKA-secure IBE as we do, rather than from RKA-secure PRF. 

Finally we note that even for linear RKAs where IBE schemes were known 
via g + E, our schemes are significantly more efficient. 

Further contributions. In the full version jS|, as a combination of the re- 
sults of 0 and | 21 , we provide definitions for RKA security in the joint security 
setting, where the same key pair is used for both signature and encryption func- 
tions, and show that a ^-RKA-secure IBE scheme can be used to build a A- RKA 
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and jointly secure combined signature and encryption scheme. This construction 
can be instantiated using any of our specific IBE schemes, by which we obtain 
the first concrete jointly secure combined signature and encryption schemes for 
the RKA setting. 

We also show in |B| how to adapt the KEM-DEM (or hybrid encryption) 
paradigm to the RKA setting, and describe a highly efficient, $ aff -RKA-secure 
CCA-KEM that is inspired by our IBE framework and is based on the scheme 
of Boyen, Mei and Waters n- Our CCA-KEM’s security rests on the hardness 
of the DBDH problem for asymmetric pairings e : Gi X G 2 — t G t; its cipher- 
texts consist of 2 group elements (one in Gi and one in G 2 ), public keys are 3 
group elements (two in G 2 and one in G t), encryption is pairing- free, and the 
decryption cost is dominated by 3 pairing operations. 

The final contribution (also in 0) is an extension of our framework that lets 
us build an RKA-secure CCA-SE scheme from any IBE scheme satisfying an 
additional master public key malleability property. Such an IBE scheme, when 
subjected to our transformation, meets a notion of strong ARKA security 0 
where the challenge encryption is also subject to RKA. Applying the CHK trans- 
form gives a strong <5- RKA-secure CCA-PKE scheme which can be converted 
into a RKA-secure CCA-SE scheme in the natural way. 

Paper organization. Section 0 contains preliminaries, Section 0 describes 
some IBE schemes and RKA attacks on them, while Section|3|presents our frame- 
work for constructing RKA-secure IBE schemes. Sectional applies the framework 
to specific schemes, and sketches the CCA-PKE and signature schemes that re- 
sult from applying the techniques of 0 . 

2 Preliminaries 

Notation. For sets X, Y let Fun(A, Y) be the set of all functions mapping X 
to Y. If S is a set then \S\ denotes its size and s <- * S the operation of picking a 
random element of S and denoting it by s. Unless otherwise indicated, an algo- 
rithm may be randomized. An adversary is an algorithm. By y A(x 1 , £ 2 , . . .) 
we denote the operation of running A on inputs x%, X 2 , ■ . . and letting y denote 
the outcome. We denote by [A(x\,X 2 , ■ ■ ■ , x n )\ the set of all possible outputs of 
A on inputs aq, £ 2 , • • • , x n . 

Games. Some of our definitions and proofs are expressed through code-based 
games [Bj. Recall that such a game consists of an Initialize procedure, proce- 
dures to respond to adversary oracle queries, and a Finalize procedure. A game 
G is executed with an adversary A as follows. First, Initialize executes and its 
output is the input to A. Then A executes, its oracle queries being answered by 
the corresponding procedures of G. When A terminates, its output becomes the 
input to the Finalize procedure. The output of the latter is called the output 
of the game. We let G A denote the event that this game output takes value true. 
The running time of an adversary, by convention, is the worst case time for the 
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execution of the adversary with any of the games defining its security, so that 
the time of the called game procedures is included. 

RKD functions and classes. We say that 0 is a related-key deriving (RKD) 
function over a set <S if (j> £ Fun(<S. S). We say that d> is a class of RKD functions 
over S if $ C Fun(<S, <S) and id 6 ( P where id is the identity function on S. In our 
constructs, S will have an algebraic structure, such as being a group, ring or field. 
In the last case, for a,b £ <S we define , <j>*, (f >^ | £ Fun(<S,<S) via = s + b, 

c j)*(s ) = as, and </> af |(s) = as + b for all s £ S. For a polynomial q over field S, 
we define </>£ oly (s) = q(s) for all s £ S. We let <£> + = { : b £ S } be the class 

of additive RKD functions, <&* = {cj>* : a £ S } be the class of multiplicative 
RKD functions, # aff = { : a, b £ S } the class of affine RKD functions, and 

for any fixed positive integer d, we let $ poly ( d ) = { </> poly : deg q < d } be the set 
of polynomial RKD functions of bounded degree d. 

If <j) ^ <j)' are distinct functions in a class <P there is of course by definition an 
s such that <j>(s) but there could also be keys s on which <p(s) = <j>'(s). 

We say that a class is claw-free if the latter does not happen, meaning for all 
distinct (j) ^ ft in ^ we have <j>(s) ^ ${s) for all s £ S. With the exception of [ 21 ] , 
all previous constructions of <F-RKA-secure primitives with proofs of security 
have been for claw-free classes |5I2 3121 131412 tij . In particular, key fingerprints 
are defined in pj in such a way that their assumption of a <2>-key fingerprint 
automatically implies that <I> is claw- free. 

IBE syntax. We specify an IBE scheme 1‘B'E = (S, V , 1C, £, D) by first specify- 
ing a non-empty set S called the master-key space from which the master secret 
key s is drawn at random. The master public key 7r <— V{s) is then produced 
by applying to s a deterministic master public key generation algorithm V. A 
decryption key for an identity u is produced via dk u <— */C(s, u). A ciphertext 
C encrypting a message M for u is generated via C A cipher- 

text C is deterministically decrypted via M £- D{dk,C). Correctness requires 
that T>(IC(s,u),£(Tr,u,M)) = M with probability one for all M £ MSp and 
all u £ USp where MSp, USp are, respectively, the message and identity spaces 
associated to J®E. 

The usual IBE syntax specifies a single parameter generation algorithm that 
produces s,7r together, and although there is of course a space from which the 
master secret key is drawn, it is not explicitly named. But RKD functions will 
have domain the space of master keys of the IBE scheme, which is why it is 
convenient in our context to make it explicit in the syntax. Saying the master 
public key is a deterministic function of the master secret key is not strictly 
necessary for us, but it helps make some things a little simpler and is true in all 
known schemes, so we assume it. 

We make an important distinction between parameters and the master public 
key, namely that the former may not depend on s while the latter might. Pa- 
rameters will be groups, group generators, pairings and the like. They will be 
fixed and available to all algorithms without being named as explicit inputs. 
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proc Initialize 
S^S-,7V^V(s) 
b «-* {0, 1} 
u* 4 — _L ; / 4 — 0 
Ret 7r 

proc Finalize(6') 
Ret (6 = b') 


proc KD(0, u) 

S' <r- <j>{s) 

If (s' = s) I -f— JU {«} 
If (u* e I) Ret _L 
Ret dk <r- * K(s\ u) 


proc LR(m, Mq, Mi) 

If (|M 0 | + |Mi|) Ret _fc 

If («* € I) Ret _L 
Ret C<-*£(n,u*,M b ) 


Fig. 2. Game IBE defining 0-RKA-security of IBE scheme TB'E = (<S, V, K, £,T>) 


V{s ) : 

Ret 7 r 
K(s,u): 
dk <— Hi(u) s 
Ret dk 


£{i t,u,M): 

t i — $ 'ffjp 

Ci i— g* 

C 2 ^H 2 (e(7r,H 1 («))*) ®M 
Ret (Gi, C 2 ) 

X>(dfc, G): 

M C 2 ® H 2 (e(dk , Ci)) 
Ret M 


V(s ): 

7T-S-3 8 
Ret 7r 

K(a,«): 

dk i •(— • H(u) r 

dk 2 <- g T 
Ret (dfci, dA)2) 


g 2 ^ R(m)‘ 

G3^e(7r, 5 i)*-M 

Ret (Gi, G 2 , G 3 ) 
X>(dfc, G): 

M <— C 3 ■ e( ;% 2 ’r 2 \ 
Ret M E( 1, 0 


Fig. 3. Boneh-Franklin IBE scheme on the left, Waters IBE scheme on the right 


RKA-secure IBE. We define <£-RKA security of IBE schemes following [T| . 
Game IBE of Fig. El is associated to 1'B‘E = (S,V,1C,£,D) and a class ( 1> of 
RKD functions over S. An adversary is allowed only one query to LR. Let 
Adv“ (A) equal 2 Pr [IBE A ] — 1 . A feature of the definition we draw attention 
to is that the key derivation oracle KD refuses to act only when the identity 
it is given matches the challenge one and the derived key equals the real one. 
This not only creates a strong security requirement but one that is challenging to 
achieve because a simulator, not knowing s, cannot check whether or not the IBE 
adversary succeeded. This difficulty is easily resolved if <l> is claw-free but not 
otherwise. We consider this particular RKA security definition as, in addition to 
its strength, it is the level of RKA security required of an IBE scheme so that 
application of the CHK and Naor transforms results in RKA-secure CCA-PKE 
and signature schemes. 


3 Existing IBE Schemes and RKA Attacks on Them 

The algorithms of the Boneh-Franklin Basicldent IBE scheme H3 are given in 
Figure 0 The parameters of the scheme are groups Gi ,Gt of prime order p. a 
symmetric pairing e : Gi x Gi — > Gt, a generator g of Gi and hash functions 
Hi : {0, 1}* — > Gi, H 2 : Gt -A {0, l} n which are modeled as random oracles in 
the security analysis. Formally, these are output by a pairing parameter generator 
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on input l k . This scheme is IND-CPA secure in the usual model for IBE security, 
under the Bilinear Diffie-Hellman (BDH) assumption. 

The algorithms of the Waters IBE scheme are also given in Figure El 
The parameters of the scheme are groups Gi,G t of prime order p, a symmetric 
pairing e : Gi x Gi —> G t, generators g, gi of Gi and group elements ho, . . . , h n G 
Gi specifying the hash function H(u) = ho hi. The Waters IBE scheme is 
also IND-CPA secure in the usual model for IBE security, under the DBDH 
assumption. 

The Waters IBE scheme is not RKA secure if includes a function </>*(s) = 
as. A call to the key derivation oracle with any such cj> yields a user secret 
key ( dki,dk 2 ) = ( <?“ s • H(u) r ,g r ). Raising this to o _1 gives (dk[, dk' 2 ) = {gf ■ 
H(u) ra ,g ra 1 ), so that {dk'^dk'^) is a user secret key for identity u under 
the original master secret key with randomness r' = ro _1 . An RKA adversary 
can thus obtain the user secret key for any identity of his choosing and hence 
break the RKA security of the Waters scheme. A similar attack applies to the 
Boneh-Franklin scheme. 

4 Framework for Deriving RKA-Secure IBE Schemes 

In the previous section we saw that the Boneh-Franklin and Waters schemes are 
not RKA secure. Here we will show how to modify these and other schemes to be 
RKA secure by taking advantage, in part, of the very algebra that leads to the 
attacks. We describe a general framework for creating RKA-secure IBE schemes 
and then apply it obtain several such schemes. 

We target a very particular type of framework, one that allows us to reduce 
RKA security of a modified IBE scheme directly to the normal IBE security 
of a base IBE scheme. This will allow us to exploit known results on IBE in a 
blackbox way and avoid re-entering the often complex security proofs of the base 
IBE schemes. 

Key-malleability. We say that an IBE scheme FB‘E = (S, V, 1C, £ , D) is ^-key- 
malleable if there is an algorithm T , called the key simulator, which, given n, an 
identity u, a decryption key dk' <— » /C(s, u) for u under s and an RKD function 
4> € $, outputs a decryption key dk for u under master secret key <p(s) that 
is distributed identically to the output of /C(<^>(s), u). The formalization takes a 
little more care for in talking about two objects being identically distributed one 
needs to be precise about relative to what other known information this is true. 
A simple and rigorous definition here can be made using games. We ask that 

PrlKMReaL^,] = Pr[KMSim^ E ^ T ] 

for all (not necessarily computationally bounded) adversaries M, where the 
games are as follows. The Initialize procedure of both picks s at random from <S 
and returns n <— V(s) to the adversary. In game KMReal/®^#, oracle KD(A u) 
returns dk <— * /C(<?!>(s), u) but in game KMSim/® £i ^ iT it lets dk' <—$K.(s,u) and 
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returns T(n,u, dk' ,</)). There are no other oracles, and Finalize(6') returns 
( 6 ' = 1 ). 

Using KM. Intuitively, key-malleability allows us to simulate a $-RKA adver- 
sary via a normal adversary and would thus seem to be enough to prove #-RKA 
security of IBE based on its normal security. Let us see how this argument 
goes and then see the catches that motivate a transformation of the scheme via 
collision-resistant identity renaming. Letting A be an adversary attacking the 
<ARKA security of IBE, we aim to build an adversary A such that 

^ Adv^(A) . (1) 

On input n, adversary A runs A(n). When the latter makes a KD(</>, u) query, 
A lets dk <— KD(id, it), where KD is A’s own key derivation oracle. It then 
lets dk <— T( 7r, u, dk, <p) and returns dk to A. Key-malleability tells us that dk 
is distributed identically to an output of KD(^,u), so the response provided 
by A is perfectly correct. When A makes a LR(u, Mo, Mi) query, A lets C <— 
LR(u, Mo, Mi) and returns C to A. Finally when A halts with output a bit b', 
adversary A does the same. 

The simulation seems perfect, so we appear to have established Equation (P). 
What’s the catch? The problem is avoiding challenge key derivation. Suppose A 
made a KD(0, u) query for a <p such that <p(s) ^ s; then made a LR(ti, M 0 , Mi) 
query; and finally, given C, correctly computed b. It would win its game, be- 
cause the condition cf>(s ) ^ s means that identity u may legitimately be used 
both in a key derivation query and in the challenge LR query. But our con- 
structed adversary A, in the simulation, would make query KD(id, it) to answer 
A’s KD(^, u) query, and then make query LR(u, Mo, Mi). A would thus have 
queried the challenge identity u to the key-extraction oracle and would not win. 

This issue is dealt with by transforming the base scheme via what we call 
identity renaming, so that ?2>-RKA security of the transformed scheme can be 
proved based on the ?2>- key-malleability of the base scheme. 

Identity renaming. Renaming is a way to map identities in the new scheme 
back to identities of the given, base scheme. Let us now say how renaming works 
more precisely and then define the modified scheme. 

Let IBE = (S, V, K,, £, D) denote the given, base IBE scheme, and let USp 
be its identity space. A renaming scheme is a pair (SI, PI) of functions where 
SI: 5 x US? ->■ USp and PI: [P(S)} x USp x USp where USfy implicitly 
specified by the renaming scheme, will be the identity space of the new scheme we 
will soon define. The first function SI, called the secret renaming function, uses 
the master secret key, while its counterpart public renaming function PI uses 
the master public key. We require that SI(qf>(s), Tt) = PI(7r, u, (j>) for all s e S, all 
7r e [P(s)], all u € USp and all (j) £ d>. This compatibility condition says that the 
two functions arrive, in different ways, at the same outcome. 

The transform. The above is all we need to specify our Identity Renaming 
Transform I RT t hat maps a base IBE scheme 1‘B‘E = (S, V, KL, E, D) to a new 
IBE scheme IB‘ E = (S, V , 1C, E, V). As the notation indicates, the master key 
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space, master public key generation algorithm and decryption algorithm are 
unchanged. The other algorithms are defined by 

K(a,Tt) =/C(s,SI(s,¥)) and £(tt, u, M) = £(n, PI(tt ) u ) id), M) . 

We clarify that algorithms of the new IBE scheme do not, and cannot, have as 
input the RKD functions <f> used by the attacker. We are defining an IBE scheme, 
and algorithm inputs must follow the syntax of IBE schemes. When the new en- 
cryption algorithm invokes PI, it sets (f> to the identity function id. (Looking 
ahead, the simulation will call the renaming functions with <fi emanating from 
the adversary attacking the new IBE scheme.) The key derivation algorithm has 
s but not 7r (recall we cannot give it n because otherwise it becomes subject 
to the RKA) and thus uses the secret renaming function. On the other hand 
the encryption algorithm has 7r but obviously not s and thus uses the public 
renaming function. This explains why we need two, compatible renaming func- 
tions. The new scheme has the same message space as the old one. Its identity 
space is inherited from the renaming scheme, being the space USp from which 
the renaming functions draw their identity inputs. 

The above compa tibilit y requirement implies that SI(s, It) = PI(7T, u, id). Prom 
this it follows that IBE preserves the correctness of IBE. We now go on to 
specifying properties of the base IBE scheme and the renaming functions that 
suffice to prove <5- RKA security of the new scheme. 

A trivial renaming scheme is obtained by setting SI(s,¥) = u = PIfyr, u, <p). 
This satisfies the compatibility condition. However, the transformed IBE scheme 
IBE ends up identical to the base IBE and thus this trivial renaming cannot 
aid in getting security. We now turn to putting a non-trivial condition on the 
renaming scheme that we will show suffices. 

Collision-resistance. The renaming scheme (SI, PI) will be required to have 
a collision-resistance property. In its simplest and strongest form the requirement 
is that 

¥ 1 ) ^ (s,H 2 ) =k SIfy>(s),¥i) ^ SI(s,H 2 ) 

for all s G <S, all Hi, ¥2 G USp and all </> G This statistical collision-resistance 
will be enough to prove that IBE is <ARKA secure if IBE is ^-key-malleable 
(cf. Theorem Q) . We will now see how this goes. Then we will instantiate these 
ideas to get concrete 'ARKA-secure schemes for many interesting classes I* in- 
cluding <£ aff and <A roly(< fy 

Theorem 1. Let IBE = (S, V, /C, £ , D) be a key-malleable IBE scheme with 
key simulator T. Let IBE = (S,V,K,,£,V) be obtained from IBE and renaming 
scheme (SI, PI) via the transform IRT described above. Assume the renaming 
scheme is statistically collision-resistant. Let A be a d>-RKA adversary against 
IBE that makes q key derivation queries. Then there is an adversary A making 
q key derivation queries such that 


Ad ^T( A ) < Adv^(A) . 


( 2 ) 
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proc Initialize /Go 

000 s <-* <S ; 7r 4 — V(s) 

001 6 4-s {0, 1} ; u* 4— _L 

002 I 4 — 0 

003 Ret 7r 

proc Initialize /Gi, G2, G3 

100 s 4-s 5 ; n 4- V(s) 

101 64— s {0, 1} ; u*4-_L 

102 I 4 — 0 

103 Ret 7r 

proc KD( 0 ,¥) //Go 

010 s' 4— </>(s) 

011 If (s'-s) 7V7u {¥} 

012 If («* 6 /) Ret _L 

013 u 4- SI(s', u) 

014 Ret dk lC(s' , u) 


proc KD(0 ,¥) /Gi 

110 s' 4- 0(s) 

111 u 4— SI(s', u) 

112 7 4- I U {«} 

113 If («* € I) Ret _L 

114 Ret dk 4 -* /C(s', w) 


proc KD(0 ,m) /G 2 
210M4-PI(7T,W, 0) 

211 1 4- 7 U {«} 

212 If («* € /) Ret _L 

213 Ret dk 4— * K(4>(s), u) 


proc KD(</>, «) /G 3 


310W4-PI(7T,W, 0) 

311/ 4- /U {m} 

312 If («* e I) Ret _L 

313 dk 4—* /C(s, w) 

314 Ret dk 4— T(- 7 t, u , dk, <f > ) 


proc LR(¥,Mq,Mi) /Go 
020lf(|M 0 |/|M 1 |)Ret_L 

021 u:* <— u 

022 If (u* € I) Ret _L 

023 u* 4- SI(s, «*) 

024 Ret C 4-* £(ir, «*, M&) 
proc LR(U,Mq,Mi) //Gi 

120 If (|M 0 | ± |Mi|) Ret _L 

121 u* 4- SI(s, u) 

122 If (u* 6 /) Ret _L 

123 Ret G 4-* £(i r, u* , M b ) 
proc LR(¥,Mq,Mi) //G 2 , G 3 

220 If (|M 0 | + |Mi|) Ret _L 

221 U* 4- PI(7T, M, id) 

222 If (u* € /) Ret _L 

223 Ret C 4-* £{1 r, «*, Mj.) 
proc Finalize(6') /All 
030 Ret (6 = 6') 


Fig. 4. Games for proof of Theorem Q] 


Furthermore, the running time of A is that of A plus the time for q executions 
ofT andq+1 executions of PI. 

Proof (Theorem 0 ). Consider the games of Fig. 0 Game Go is written to be 
equivalent to game IBE-^r. so that 

Adv m";f^ = 2Pr[G|]-l. (3) 

In answering a KD query, Go must use the key-generation algorithm /C of 
the new scheme l‘B‘E but with master secret key s' = (f>(s). From the definition 
of 1C, it follows that not only is the key-generation at line 014 done under s', 
but also the identity renaming at line 013. LR, correspondingly, should use £, 
and thus the public renaming function PI. The compatibility property however 
allows us at line 023 to use SI instead. This will be useful in exploiting statistical 
collision-resistance in the next step, after which we will revert back to PI. 

The adversary A we aim to construct will not know s . A central difficulty in the 
simulation is thus lines Oil, 012 of Go where the response provided to A depends 
on the result of a test involving s, a test that A cannot perform. Before we can 
design A we must get rid of this test. Statistical collision-resistance is what will 
allow us to do so. KD of game Gi moves the identity renaming up before the list of 
queried identities is updated to line 111 and then, at line 112, adds the transformed 
identity to the list. LR is likewise modified so its test now involves the transformed 
(rather than original) identities. We claim this makes no difference, meaning 

Pr[G(f] = Pr[Gf] 


(4) 
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Indeed, statistical collision-resistance tell us that ( s' ,u ) = (s,¥*) iff SI(s',¥) = 
SI(s. u*). This means that lines Oil, 012 and lines 112, 113 are equivalent. 

Compatibility is invoked to use PI in place of SI in both KD and in LR in 
G 2 , so that 

Pr[Gf] = Pr[G?] . (5) 

Rather than use s' for key generation as at 213, G 3 uses s at 313 and then 
applies the key simulator T. We claim the key-malleability implies 

Pr[G?] = Pr[G?] . ( 6 ) 

To justify this we show that there is an adversary M such that 

Pr[KMReal^ 0 j m Pr[G?] and Pr[KMSim^ r ] = Pr[G?f] . 

Adversary M, on input n, begins with the initializations u* <— _L ; I G- 0 ; 
b<— s {0, 1} and then runs A on input n. When A makes a KD(0,u) query, M 
does the following: 

u <- PI(tt, u, <ji) ; / 4- I U {u} ; If (u* G I) Ret T ; dk «- KD(<£, u). 

If M is playing game KMReal then its KD oracle will behave as line 213 in game 
G 2 , while if M is playing game KMSim its KD oracle will behave as lines 313,314 
in game G 3 . When A makes its LR(¥, Mo,Mi) query M sets u* <— PI( 7 r, u, id) 
and checks if u* G I, returning _L if so. M then computes C <— $ £(n, u* , Mb) 
which it returns to A. When A halts with output b', M returns the result of 
( b ' = b). If M is playing game KMReal then game G 2 is perfectly simulated, 
while if M is playing KMSim then game G 3 is perfectly simulated, so M returns 
1 with the same probability that A wins in each case and by the key-malleability 
of i®E Equation © holds. 

Finally, we design A so that 

Adv^(A) ~ 2Pr[G?]-l. (7) 

On input n, adversary A runs A{-k). When the latter makes a KD query, 
A does the following: 

u 4 — PI( 7 r, H, (j >) ; dk KD(id, u) ; dk T(n,u, dk,cj)). 

It then returns dk to A. The KD invoked in this code is A’s own oracle. Com- 
patibility tells us that u = SI(0(s), u) and thus from the definition of 1‘B‘E. the 
response to A’s query is distributed according to /C(0(s), u). But key-malleability 
then tells us that dk is distributed identically to this, so the response provided 
by A is perfectly correct. When A makes a LR(u, M 0 ,M\) query, A does the 
following: 

u <— PI( 7 r, u, id) ; C <- LR(u, M 0 , Mi). 

It then returns C to A. The LR invoked in this code is A’s own oracle. The 
definition of 1‘B‘E implies that the response provided by A is again perfectly 
correct. Finally when A halts with output a bit b', adversary A does the same. 
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5 Applying the Framework 

Affine RKD functions for Boneh-Franklin and Waters. We show how 
the framework can be instantiated with the IBE schemes of Boneh-Franklin and 
Waters to achieve IBE schemes secure against affine related-key attacks. First 
we look at key-malleability. Keys in the Boneh-Franklin IBE scheme are of the 
form dk' = H\{u) s , so the algorithm T is as follows: 

T{ 7T, u, dk', </> aib ): dk <- dk' a ■ H 1 (u) b ; Ret dk 
The output of T is a valid key for user u under master secret key 0 a , b (s), since: 
dk' a -H\{u) b = Hi(u) sa -Hi(u) b = Hi(u) as+b . Since the key derivation algorithm 
is deterministic, the keys output by T are distributed identically to the keys 
output by K.{(j){s),u), and so the Boneh-Franklin IBE scheme is key-malleable. 

Keys in the Waters IBE scheme are of the form (dk'i, dk' 2 ) = (<?f • H(u) r ,g r ) 
for some r in Z p , so the algorithm T is as follows: 

T(7T, u, dk' , (j> a ,b)'- 

If (a = 0) then r<-$ Z p ; dki <- g b ■ H{u) r ; dk 2 4— g r 
Else dk it— dk'i ■ g\ ; dk 2 4— dk 2 
Ret (dki, dk 2 ) 

When the RKD function is a constant function, T behaves exactly as the key 
derivation algorithm under master secret key b, so its output is valid and correctly 
distributed. Otherwise, the output of T is still a valid key for user u under master 
secret key (j> a ,b(s), now under randomness ra, since: 

dk T ■ g\ = ( g{ ■ H( u y) a • g\ = gT +b H{uy a dk' 2 a = g™ . 

Since r is uniformly distributed in Z p , ra is also uniformly distributed in Z p and 
so the keys output by T are distributed identically to those output by /C(</>(s), u). 
Hence the Waters IBE scheme is key-malleable. 

The same identity renaming scheme can be used for both IBE schemes. 
Namely, SI(s,«) returns u\\g s and PI(7 r,u,0 O) (,) returns u\\-K a ■ g b . The com- 
patibility requirement is satisfied and the renaming scheme is clearly collision- 
resistant since HiWg^ = U 2 \\g s =>■ Hi = U 2 A = s. Thus the IBE schemes 
of Boneh-Franklin and Waters are key-malleable and admit a suitable identity 
renaming scheme, and so satisfy the requirements of Theorem QJ Notice that in 
the Waters case, we must increase the parameter n by the bit length of elements 
of Gi (and hence increase the size of the description of the scheme parameters) 
to allow identities of the form u||<? s to be used in the renaming scheme. 

The following theorem is obtained by combining Theorem Q] with m, and 
the running time of B below may be obtained in the same way. 

Theorem 2. Let ILLE = (S, V, 1C, £, V) be the Boneh-Franklin IBE scheme 
shown in Fig. 0 under the above identity renaming transform. Let A be a <£ aff - 
RKA adversary against FBE making qkd key derivation queries and qn 2 queries 
to random oracle H 2 . Then there is an algorithm B solving the Decision Bilinear 
Diffie- Heilman problem such that 
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The following theorem is obtained by combining Theorem Q] with EE3> and the 
running time of B below may be obtained in the same way. Concrete-security 
improvements would be obtained by using instead the analysis of Waters’ scheme 
from jZj. 

Theorem 3. Let IDT = (S,V,IC,£,D) be the Waters scheme shown in Fig. 0 
under the above identity renaming transform. Let A be a L> aS -RKA adversary 
against BBT, making qKD key derivation queries. Then there is an algorithm B 
solving the Decision Bilinear Diffie-Hellman problem such that 

Adv^^e (3) < 32(n + 1) • q KD ■ Adv dbdh (£) . (9) 

We recall from that, given a ^-RKA-secure IBE scheme, the CHK transform 
jT2j yields a <£-RKA-secure CCA-PKE scheme at the cost of adding a strongly 
unforgeable one-time secure signature and its verification key to the IBE ci- 
phertexts. In the full version j^j we show that the more efficient Boneh-Katz 
transform can also be used to the same effect. We omit the details of the 
d> aff -RKA-secure CCA-PKE schemes that result from applying these transforms 
to the above IBE schemes. We simply note that the resulting CCA-PKE schemes 
are as efficient as the pairing-based schemes of Wee ESI, which are only <P hn - 
RKA-secure. Similarly, using a result of 0, we may apply the Naor transform to 
these IBE schemes to obtain <? aff -RKA-secure signature schemes that are closely 
related to (and as efficient as) the Boneh-Lynn-Shacham m and Waters m 
signature schemes. The verification algorithms of these signature schemes can 
be improved by replacing Naor’s trial encryption and decryption procedure by 
bespoke algorithms, exactly as in |1 (1251 . 

An IBE SCHEME HANDLING RKAs FOR BOUNDED DEGREE POLYNOMIALS. 
We show how to construct an IBE scheme that is RKA secure when the RKD 
function set equals <i>P°b'( fi ) . the set of all polynomials of degree at most d, 
for an arbitrary d chosen at the time of master key generation. The scheme 
is obtained through a simple extension of the IBE scheme of Waters com- 
bined with the identity renaming transform used above. The only change we 
make to the Waters scheme is in the master public key, where we add the ex- 
tra elements g s , . . . , g s . g% s , ... ,g\ s alongside g s . These elements assist in 
achieving key-malleability for the set $P 0 b( d ). The master public-key genera- 
tion algorithm V of the extended Waters scheme, on input s, returns 7r <— 
( g s ,g s , ... ,g s ,(gi) s , . . . , (gi) s ). The other algorithms and keys remain un- 
changed; in particular, key derivation does not make use of these new elements. 
This extended Waters IBE scheme is secure (in the usual IND-CPA sense for 
IBE) under the g-type extension of the standard DBDH assumption captured 
by the game in Fig. 0 We define the advantage of an adversary A against the 
problem as Adv«" edbdh (A) = 2 Pr[g-EDBDH A ] - 1. 

Theorem 4. Let IDT, = [S, V, 1C, £. V) be the extended Waters scheme. Let A 
be an adversary against IDT making qKD key derivation queries. Then there is 
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proc Initialize 

g <-* Gi ; x , y, z «-* Z p ; b <-* {0, 1} 
If (6 = 1) T <— e(g,g) xyz 
Else T l — $ Gt 

Ret g,;?*, g* ,...,g x , g y , , g( x 


proc FlNALIZE(fo') 
Ret ( b = b') 


Fig. 5. g- Extended Decision Bilinear Diffie-Hellman (g-EDBDH) game 


an algorithm B solving the q-Extended Decision Bilinear Diffie-Hellman problem 
for q = d such that 

Adv%(A) < 32 (n + 1) • q KD • Adv 9 ' edbdh (B) . (10) 

To see this, observe that the original proof of security for Waters’ scheme f 2 5171 
also goes through for the extended scheme, using the elements g,g x ,g y ,T from 
the g-EDBDH problem to run the simulation as in the original proof and using 
the additional elements from the g-EDBDH problem to set up the master public 
key in the extended scheme. 

We give evidence for the validity of the g-EDBDH assumption by examining 
the difficulty of the problem in the generic group model. The problem falls within 
the framework of the generic group model “master theorem” of Boneh, Boyen and 
Goh [H]. In their notation, we have P = {1, x, x 2 , . . . , x q ,y, x 2 y , . . . , x q y, z}, Q = 
1, and / = xyz. It is clear by inspection that P, Q and / meet the independence 
requirement of the master theorem, and it gives a lower bound on an adversary’s 
advantage of solving the g-EDBDH problem in a generic group of the form 
(g + 1)(9£ + 4</ + 6 ) 2 /p where q^ is a bound on the number of queries made by 
the adversary to the oracles computing the group operations in G,G t- While a 
lower bound in the generic group model does not rule out an efficient algorithm 
when the group is instantiated, it lends heuristic support to our assumption. 

The extended Waters IBE scheme is 9' poly ( d )-key malleable with algorithm T 
as follows: 

T{n,u,dk',<f> ao , ait ... !ad ): 

If (ao = 0) then r «— * Z p ; dki <- gf° ■ H(u) r ■ (gi s2 ) 2 ■ ■■ ( gi sd ) d ; dk 2 <- g r 
Else dk if— g1° ■ dk ^ ■ (gi 8 ) ■ ■ ■ {gi s ) ; dk% dk 2 1 

Ret (dki, dk 2 ) 

The identity renaming scheme is then defined via 

Sl{s,u) = u\\g s and PI(tt,¥, <j> ao ,ai,...,a d ) = u\\g a ° ■ n ai ■ {g 8 *)* ■ ■ ■ (g**) * 

which clearly meets the compatibility and collision-resistance requirements. 
Combining Theorem [I] with Theorem 0 gives the following theorem. 

Theorem 5. Let I ®£ = ( S,V ,K.,£,D ) be the extended Waters scheme un- 
der the above identity renaming transform. Let A be a $P°h ( d ) -RKA adversary 
against EB“E making qKD key derivation queries. Then there is an algorithm B 
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solving the q-Extended Decision Bilinear Diffie-Hellman problem for q = d such 
that 

Adv S£w« (3) - 32(n + 1} ' qKD • Adv9 ' edbdh ( B ) • U 1 ) 

As in the affine case, we may apply results of 0 to obtain a ^ poly ( d )-RKA- 
secure CCA-PKE scheme and a <£ pol y( d )-RKA-secure signature scheme. We omit 
the detailed but obvious description of these schemes, noting merely that they 
are efficient and secure in the standard model under the gr-EDBDH assumption. 
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Abstract. In this paper, we present the first inner-product encryption 
(IPE) schemes that are unbounded in the sense that the public parame- 
ters do not impose additional limitations on the predicates and attributes 
used for encryption and decryption keys. All previous IPE schemes were 
bounded, or have a bound on the size of predicates and attributes given 
public parameters fixed at setup. The proposed unbounded IPE schemes 
are fully (adaptively) secure and fully attribute-hiding in the standard 
model under a standard assumption, the decisional linear (DLIN) as- 
sumption. In our unbounded IPE schemes, the inner-product relation 
is generalized, where the two vectors of inner-product can be different 
sizes and it provides a great improvement of efficiency in many appli- 
cations. We also present the first fully secure unbounded attribute-based 
encryption (ABE) schemes, and the security is proven under the DLIN 
assumption in the standard model. To achieve these results, we develop 
novel techniques, indexing and consistent randomness amplification, on 
the (extended) dual system encryption technique and the dual pairing 
vector spaces (DPVS). 


1 Introduction 

1.1 Background 

IPE and ABE. The notions of inner-product encryption (IPE) and attribute- 
based encryption (ABE) introduced by Katz, Sahai and Waters p and Sahai and 
Waters [l^] constitute an advanced class of encryption, functional encryption 
(FE), and provide more flexible and fine-grained functionalities in sharing and 
distributing sensitive data than traditional symmetric and public-key encryption 
as well as identity-based encryption (IBE). 

In FE, there is a relation R(v,x), that determines whether a secret key as- 
sociated with a parameter v can decrypt a ciphertext encrypted under another 
parameter x. The parameters for IPE are expressed as vectors x (for encryption) 
and v (for a secret key), where R(v. x) holds, i.e., a secret key with v can decrypt 
a ciphertext with x, iff v-x = 0. (Here, v-x denotes the standard inner-product.) 
In ABE systems, either one of the parameters for encryption and secret key is 
a set of attributes, and the other is an access policy (structure) or (monotone) 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 349-gBS] 2012. 
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span program over a universe of attributes, e.g., a secret key for a user is associ- 
ated with an access policy and a ciphertext is associated with a set of attributes, 
where a secret key can decrypt a ciphertext, iff the attribute set satisfies the pol- 
icy. If the access policy is for a secret key, it is called key-policy ABE (KP- ABE) , 
and if the access policy is for encryption, it is ciphertext-policy ABE (CP- ABE). 

For some applications, the parameters for encryption are required to be hid- 
den from ciphertexts. To capture the security requirement, Katz, Sahai and 
Waters 0 introduced attribute-hiding (based on the same notion for hidden vec- 
tor encryption (HVE) by Boneh and Waters i), a security notion for FE that 
is stronger than the basic security requirement, payload-hiding. Roughly speak- 
ing, attribute-hiding requires that a ciphertext conceal the associated parameter 
as well as the plaintext, while payload-hiding only requires that a ciphertext 
conceal the plaintext. A weaker notion of attribute- hiding than the original one 
was given by Q. The weaker notion is called weakly attribute-hiding , and 
the original one is fully attribute-hiding. Informally, in the fully attribute-hiding, 
the secrecy of attribute x is ensured even against an adversary having a secret 
key with v such that R(v,x) holds (i.e., no information is released on x except 
R(v, x ) holds), while it is ensured only when R(v. x) does not hold in the weakly 
attribute-hiding (see Definitional for the definition of the fully attribute-hiding). 

To the best of our knowledge, the widest class of attribute-hiding FE is IPE 
0. H IH 03 (KSW08, LOS+IO, OTIO and OT12 schemes). Inner-products for 
IPE represent a fairly wide class of relations including equality tests as the 
simplest case (i.e., anonymous IBE and HYE are very special classes of attribute- 
hiding IPE), disjunctions or conjunctions of equality tests, and, more generally, 
CNF or DNF formulas. We note, however, that inner-product relations are less 
expressive than a class of relations (on span programs) for ABE, while existing 
ABE schemes for such a wider class of relations are not attribute-hiding but only 
payload-hiding. 

Among the existing IPE schemes, only the OT12 IPE scheme 0 achieves 
the full (adaptive) security and fully attribute-hiding simultaneously, whereas 
other attribute-hiding IPE schemes |g,[llL 7, are selectively secure or weakly 
attribute-hiding, and some IPE schemes 0, UJJ] only achieve payload-hiding. As 
for ABE, Lewko et.al. and Okamoto-Takashima ABE schemes B, LU2] are f ully 
secure in the standard model, while ABE schemes ms E, |2U| before Ej] 
were selectively secure. 

Unbounded IPE and ABE. All previous constructions of IPE and ABE ex- 
cept the Lewko- Waters ABE scheme j§] have restriction, or are bounded, in the 
choice of the parameters for secret key and encryption once the public parame- 
ters have been set. The only unbounded ABE scheme j§|, however, is selectively 
secure, while they presented an unbounded hierarchical identity-based encryption 
(HIBE) that is fully secure in the standard model. No unbounded IPE scheme 
has been presented. Therefore, no fully secure and unbounded scheme for an 
advanced class of encryption like IPE or ABE has been presented. 
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In practice, it is highly desirable that the parameters for secret key and en- 
cryption should be flexible or unbounded by the public parameters fixed at setup, 
since if we set the public parameters for a possible maximum size (e.g., the maxi- 
mum dimension of predicate and attribute vectors for IPE) , the size of the public 
parameters should be huge. 

Removing the restrictions for fully secure IPE and ABE, however, is quite 
challenging. As mentioned above, no fully secure and unbounded scheme for an 
advanced class of encryption like IPE or ABE has been presented. The difficulty 
resides in the existing techniques for proving the full (or adaptive) security of 
such an advanced class of encryption. 

The only known technique to prove the full security of an (attribute-hiding) 
IPE or ABE system is the dual system encryption by Waters jl0] and its exten- 
sion 0 . In the techniques, information theoretical arguments (e.g., conceptual 
change due to the same distribution and the independent randomness of two 
distributions etc.) over some (hidden) parts of a secret-key and challenge cipher- 
text play a key role in the security proof, provided that the adversary follows 
the secret-key-query condition in the security games. To execute a security proof 
based on the information theoretical arguments, an appropriate distribution of 
randomness consistent with the key-query condition should be supplied in the 
proof games transformed from the original proof game. 

As for bounded IPE and ABE schemes, the public parameters can supply 
immanent randomness enough for the arguments, since the size of parameters 
for secret-keys and encryption is bounded by the public parameters. For example, 
when the dimension of vectors for IPE is required to be n, the public parameters 
whose size is 0(ri) with respect to n should be given in bounded IPE, and the size 
of secret randomness to generate the public parameter is 0(n 2 ). Such an amount 
of randomness can be enough for the arguments over n-dimensional vectors. 

In contrast, for unbounded IPE and ABE schemes, some (unbounded amount 
of) randomness whose distribution is consistent with the key-query condition 
should be supplied in addition to the randomness provided by the public param- 
eters. For example, even when the dimension of vectors for IPE is required to 
be n, the size of the public parameters is 0(1) in unbounded IPE, i.e., the size of 
secret randomness to generate the public parameters is 0(1). Clearly, such a size 
of randomness is not sufficient for the information theoretical arguments over 
n-dimensional vectors. Therefore, any additional source of randomness should 
be provided, and the distribution of the randomness should be specific (i.e., 
consistent with the key-query condition). For the unbounded HIBE scheme j§], 
where the equality (un-)matching is the key-query condition, a simple compres- 
sion technique works well to create such randomness since equality can be sim- 
ply compressed with preserving the property. The key-query condition for IPE 
and ABE, however, is in general much more complicated than just the equality 
matching for (H)IBE, and no technique was known to create randomness consis- 
tent with such a complicated condition in some security proofs. This is a reason 
why j§] succeeds in realizing a fully secure unbounded HIBE but not for ABE 
(and not for IPE). 


352 T. Okamoto and K. Takashima 


Restriction on IPE. The existing IPE schemes have another restriction on the 
parameters (i.e., vectors) for secret key and encryption that the dimensions of x 
(for encryption) and v (for a secret key) should be equivalent. Such a restriction 
may be considered to be inevitable for the inner-product relation onff, but 
it is required to be relaxed in various applications to improve the efficiency, 
especially in unbounded IPE systems where the setup (public) parameters give 
no restriction on the dimensions of vectors. 

Let us consider an example on a genetic profile data of an individual. It 
is desirable that such a sensitive data be treated as encrypted data even for 
data processing and retrievals. Although a genetic profile may include a large 
amount of information, only a part of the profile is examined in many applica- 
tions. For example, let X\. . . . , A-ioo be variables of 100 genetic properties and 
xi , . . . , a: 100 be Alice’s values of these variables. To evaluate if f(x\ , . xioo) = 0 
for any examination (multivariate) polynomial / with degree 3, or the truth 
value of the corresponding predicate <pf(xi , . . . , aqoo)> the attribute vector x 
of Alice should be a monomial vector of Alice’s values with degree 3, x := 
(1, xi,..., xioo,Xi, X1X2, • • • , xl 00 , x\,x\x-2, • • • , £ioo)> whose dimension is around 
10 6 . A predicate vector v for a secret key can be associated with predicate </>/. 

To ensure the private data processing of x, it should be encrypted (say c 
for a ciphertext of x) by a fully attribute-hiding IPE scheme, since whether 
<j)f(x 1, . . . ,£100) bolds can be examined with releasing no other information by 
checking whether c can be decrypted by a secret key with v (i.e., R(v, x) holds). 
Here, if c is encrypted by fully attribute-hiding IPE, it releases no information on 
x except that R(v, x) holds, or <j>f(x 1, . . . , aqoo) holds, however, if it is encrypted 
by weakly attribute-hiding IPE, such desirable security cannot be ensured. 

Let a predicate for vbe ((X5 = a) V(Xl 6 = b))A(X 57 = c), which focuses only 
three factors, X 5 ,Xi 6 ,X 57 , among the 100 genetic properties. It can be repre- 
sented by a polynomial equation, ri(X 5 — a)(Xi 6 — b) + r 2 (X 57 — c) = 0 (where 
ri,r 2 «- F g ), i.e., {riab-r2c)-ribX 5 -riaXiG + r2X$ 7 +riX 5 Xi G = 0. In order 
that ri(a:5 — a)(a:i6 — b) + ^(£57 — c) = 0 iff v-x = 0, vector v should be (( riab — 
r 2 c), 0, . . . , 0, — rib, 0, . . . , 0, -qa, 0, . . . , 0, r 2 , 0, . . . , 0, rq, 0, . . . , 0), whose dimen- 
sion is equivalent to that of x, i.e., around 10 6 , although the effective dimension 
of v is just 5. This is due to the above-mentioned restriction on the inner-product 
relation of the existing IPE schemes. The size of secret key for v then should be 
in proportion to the dimension of v (and x), around 10 6 . This example shows 
us a strong practical motivation, especially for unbounded IPE schemes, to relax 
this restriction on the inner-product relation and to shorten the length of the 
secret key to that in proportion to the effective dimension, e.g., 5, instead of 
around 10 6 . 

1.2 Our Results 

1. This paper introduces a new concept of IPE, generalized IPE, which relaxes 
the above-mentioned restriction of IPE and consists of three types of IPE, 
Types 0, 1 and 2. Here the notion of Types 1 and 2 is introduced in this 
paper, and Type 0 is the traditional one (see Remark below). 
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Table 1 . Comparison of attribute-hiding IPE schemes , where |G| and |Gt| represent 
size of an element of G and that of G t, respectively. AH, IP, PK, SK, CT, GSD and 
eDDH stand for attribute- hiding, inner-product, master public key (public parameters), 
secret key, ciphertext, general subgroup decision 0] and extended decisional Diffie- 
Hellman p], respectively. 



re '+9)|Gl 

F |Gt| 


* It can be easily relaxed. 


Remark: We now roughly explain the three types of inner-product rela- 
tions. To relax the above-mentioned restriction on the inner-product relation, 
we introduce a new type of inner-product (generalized inner-product) for v 
and x, where their dimensions can be different (say n and n' for the dimen- 
sions of v and x). In this notion, vector v and x are expressed by {(i, v t ) \ 
t £ Is, jj la = n} and {(f, xt) \ t £ Is, §Is = n '}, respectively, where isN 
is an index for vectors, whose semantics is given by each application. Here 
note that we abuse the same vector notation, v , for the new expression as 
well as for the conventional one, (ui, . . . , v n ). In the above-mentioned exam- 
ple, x := {(1, 1), (2, x\), . . . , (101, zioo), (102, x\), (103, x%x 2 ), ■■■, ( n a;i 00 )} 
where Is := {1, 2, ... , n'}, and v := {(1, riab—r 2 c), (6, — rib), (17, — ri<z), (58, 
r 2 ), (517, ri)} where := {1,6,17,58,517}. The generalized inner-product 
of v over x is defined by Vt - Xt ^ ^ Otherwise, it is undefined. 

By using the generalized inner-product notion, the secret key size can be in 
proportion to the effective dimension (e.g., 5 instead of around 10 6 ). 

We then introduce three types of IPE schemes. For Type 1, relation R(v, x) 
holds iff the generalized inner-product of v over x is 0, while for Type 2 it 
holds iff the generalized inner-product of x over v is 0. We call Type 0 
for the conventional inner-products, i.e., relation R(y,x) is defined by the 
standard inner-product of v and x, where v and x have the same dimension 
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Table 2. Comparison of KP-ABE Schemes, where |G| represents the size of an element 
of G, and PK, SK, CT and GSD stand for master public key (public parameters), secret 
key, ciphertext and general subgroup decision @], respectively. And, d, n, n m ax, t and 
fcmax are the number of sub-universes of attributes, the number of attributes for a CT, 
the maximum number of attributes for a CT, the row size of an access policy matrix 
for a SK and the maximum value of the degree of access policies, respectively. 



LW11 [g] 

LOS+10 [7] 

OTIO [12] 

Proposed KP-ABE 

(basic) 

(modified) 

(basic) 

(modified) 

(basic) 

Section 0 

(modified) 

in full ver. 

Bounded or 

Unbounded 

unbounded 

bounded 

bounded 

bounded 

bounded 

unbounded 

unbounded 

Security 

selective 

full 

full 

full 

full 

full 

full 

Order of G 

composite 

composite 

composite 

prime 

prime 

prime 

prime 

Assump. 

GSD 

GSD 

GSD 

DLIN 

DLIN 

DLIN 

DLIN 

Degree of 
access policies 

arbitrary 


arbitrary 


arbitrary 


arbitrary 

PK size 

0(1)|G| 


Offw)|G| 

0(d)|G| 

Ofd)|G| 

0(1) |G| 

0(1)|G| 

SK size 

Off)|G| 

Off) |G| 

Off) |'Gf 

Off)[Gj 

Off)|G| 

Off )|G| 

off)iej 

CT size 

0(n)|G| 

0(«)|G|: 

OffwiPJ 

0(n)|G| 

0(fcn,a*n) |G| 

0(n)|G| 

O(femaxTl)l0|l 


(in other words, the inner-product for Type 0 is defined iff these dimensions 
are equivalent.) 

2. We present the first unbounded inner-product encryption (IPE) schemes. The 
proposed unbounded IPE schemes are fully (adaptively) secure and fully 
attribute-hiding in the standard model under a standard assumption, the 
decisional linear (DLIN) assumption. The proposed unbounded IPE schemes 
consist of the above-mentioned types of generalized IPE, Types 0, 1 and 2, 
For comparison of attribute-hiding IPE schemes, see Table El 

3. We present the first unbounded KP- and CP-ABE schemes that are fully 
secure (adaptively payload-hiding) in the standard model. The proposed 
unbounded ABE schemes are fully secure under the DLIN assumption, and 
are for a wide class of relations, non-monotone access structures (see the full 
version for the proposed CP-ABE scheme) . See Table [2] for comparison of 
KP-ABE schemes. 

Remark: Similarly to the existing fully secure ABE schemes in the stan- 
dard model 0, EQ except 0, our basic ABE scheme (Section EJ) has 
a restriction that the degree of access policies is 10. A modified KP-ABE 
scheme is shown in the full version of this paper to relax the restriction or 
to achieve an arbitrary degree k of access policies with preserving the fully 

1 Informally, the degree may imply the number of appearance of a variable in a formula, 
e.g., formula ((a; = a)V(x = b))A(y = c ) has degree 2 for variable x. For the definition 
of the degree of access policies in our schemes, see the full version. The degree should 
be a bit differently defined in 0, 0, 0, [53, 0, H , where degree 1 is called one-use. 
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secure and unbounded property. It, however ,shares a shortcoming of the ex- 
isting fully secure (modified) ABE schemes smii that the ciphertext size 
grows linearly with k. Here, a (maximum) value of k can be determined in 
each application of our ABE scheme, while the public parameters are fixed 
and commonly shared by all applications and users. 


1.3 Key Techniques 

As mentioned above, the difficulty of realizing a fully secure unbounded IPE 
or ABE scheme arises from the hardness of supplying an unbounded amount of 
randomness consistent with the complicated key-query condition for the (dual 
system encryption) security arguments on IPE or ABE. To overcome this dif- 
ficulty, we develop novel techniques, indexing and consistent randomness am- 
plification, on the dual system encryption and the dual pairing vector spaces 
(DPVS). Roughly speaking, the indexing technique is for supplying a source of 
unbounded amount of randomness and the consistent randomness amplification 
technique is for amplifying the randomness of the source through a computa- 
tional assumption (e.g., the DLIN assumption in our case) and the randomness 
of hidden bases as well as for adjusting the distribution of the amplified ran- 
domness to be consistent with a condition. This methodology could provide a 
general framework for proving the security in unbounded situations. 

In DPVS, a pair of dual (or orthonormal) bases for JV-dimensional linear 
spaces, B := (hi , . . . , b.y) and B* := (b*, . . . , b'* w ), are randomly generated using 
a secret random linear transformation X (random N x N matrix) (see Section 
0. In a typical application of DPVS to cryptography, a part of B (say B) is used 
as a public key (public parameters), and B* as a secret key, where X is the top 
level secret key and the source of randomness. 

In a typical construction of bounded IPE schemes ft mill which are based 
on DPVS, once a basis of DPVS, a part of the basis of a V-dimensional space 
is published as public parameters, the dimension n of predicate and attribute 
vectors for secret key and encryption is bounded or fixed, e.g., n < N/ 4 (i.e., 
N = 0(n )). The full security is proven through the information theoretical 
arguments, and the randomness of secret matrix X (e.g., the amount of the 
randomness is 0(n 2 )) supplies enough randomness for the arguments. 

In contrast, the dimension, n, of the predicate and attribute vectors is not 
bounded by the public parameters in unbounded IPE. For example, in one of 
the proposed IPE schemes (Section^), the public parameters consist of a con- 
stant number of elements, 9 elements of bases (or 105 pairing group elements), 
B 0 := (b 0 ,i, &o, 3 , b 0 , 5 ) and B := (bi, . . . , b 4 , b i4 , b 15 ), where random matrices of 
constant sizes, X 0 ^ Fj) x5 and X\ ^ Fj 5x15 , are enl p] 0 y ef ] to generate the 
public parameters. The randomness of the public parameters, just a constant 
amount with respect to n, is clearly insufficient for the (dual system encryption) 
arguments on the proof of full security. 

To supply additional randomness for the purpose, in our IPE schemes, we 
introduce a technique called indexing, where two-dimensional index vectors, 
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<7t(l,t) and nt(t, — 1) are embedded into ciphertext c t and secret key k*, re- 
spectively, where a t and pt are freshly random for each t. In our IPE scheme 
(Section^) where n = n' for simplicity, for example, secret key (k *, . . . , k*) for 
v := (wi, . . . , v n ) can be expressed by a coefficient vector, (p t (t,—l),6v t ,...), 
for t = 1, . . . , n, over basis B*, i.e., k* := —1), 6vt, ■ • -)b* and ciphertext 

(ci, . . . ,c n ) for x := (xi, . .. ,x n ) can be expressed by c t := (at(l,t),ujx t , ■ . .)» 
for t = 1, . . . ,n, where 6, ui are randomly selected. While the size of the public 
parameters or its randomness is constant in n, an unbounded amount of ran- 
domness, {nt}t= {(7t}t=i, can be supplied to secret key and ciphertext. 
This is a key idea of the indexing technique. 

Although the technique supplies an unbounded amount of randomness, i.e., 
0(n)-size of randomness, it is not enough for our purpose. We need more and a 
specific distribution of randomness. This is because: in the proof of full security 
on dual system encryption and the extension, such a real randomness provided 
by the indexing technique should be expanded into a hidden part in spaces over 
bases B and B*, and the distribution should be also adjusted to (or consistent 
with) the key-query condition for IPE or ABE. For this purpose, i.e., in order 
to amplify the randomness to a hidden subspace and to adjust it to a specific 
distribution, we develop another technique, consistent randomness amplification. 

For a bit more detailed explanation of the consistent randomness amplifica- 
tion technique, we will briefly review a hidden part (subspace) of DP VS. As 
mentioned above, in a typical application of DPVS to cryptography, a part of B 
(say B) is used as a public key (public parameters). Therefore, the basis, B — B, 
is information theoretically concealed against an adversary, i.e., even an infi- 
nite power adversary has no idea on which basis is selected as B — B when B 
is published. The underlying dual vector spaces, span(B) and span(B*}, are 15- 
dimensional for our IPE scheme (Type 1 or 2) and 14-dimensional for our ABE 
scheme. The subspaces employed for public parameters are just 6-dimensional 
and other 2 dimensional basis can be public. Hence, the basis for the remaining 
7 or 6-dimensional subspace is information theoretically concealed (uncertain). 
The consistent randomness amplification technique is executed over these 7 or 
6-dimensional hidden subspaces. For example, as mentioned above, a real secret 
key {fe*} and ciphertext {c t } are expressed by k* := (p t (t,— 1), Sv t , s t , | 0 7 | , . . .)b* 
and c t := (cq(l,t), ujxt, uj, \ 0 7 | , ■ ■ -)b- This technique provides a transforma- 
tion (for the dual system encryption technique and the extension) to the fol- 
lowing forms: fe* := —1); Sv t , St, |~0 4 , (7r v t , at) • Ut, 0 | i • ■ -)b* and c t := 

(<7t( 1, t),ux t , 15, | . . . , (rx f ,r) • Z t , 0 | , . . .)b, where Z t is an independently ran- 
dom 2x2 matrix for each t and U t := (Zf) -1 , and other new variables are 
random. Here, the box-framed parts are the information theoretically hidden 
subspaces, the randomness of the hidden parts is amplified and the distribution 
of (irvt,at) ■ Ut and (rx t , r) • Z t is consistent with the key-query condition. 

The consistent randomness amplification technique is composed of several 
computational and conceptual (information theoretical) transformations. One 
of the key tricks of the transformations is to amplify a source of randomness to 
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a hidden part by applying a computational assumption, the DLIN assumption. 
Another computational trick is to swap two vectors in different positions under 
DLIN. Information theoretical key tricks are inter-subspace and intra-subspace 
types of conceptual transformations (see the full version for more details). 

The security proofs of our IPE and ABE schemes are hierarchically con- 
structed in a modular manner. The very top level of the security proof is based 
on the dual system encryption and its extension. Several problems in the mid- 
dle level support the top level arguments. Our key techniques, the indexing and 
consistent randomness amplification techniques, which are also constructed in a 
hierarchical manner, are employed in the lowest level to reduce the hardness of 
the middle level problems to the DLIN assumption. 

1.4 Notations 

When A is a random variable or distribution, y ^ A denotes that y is randomly 
selected from A according to its distribution. When A is a set, y <— A denotes 
that y is uniformly selected from A. y := z denotes that y is set, defined or 
substituted by z. We denote the finite field of order q by F g , ¥ q \ {0} by F * , 
and the set of positive integers by N. The vector 0 is abused as the zero vector 
in F” for any n. X T denotes the transpose of matrix X. A bold face letter 
denotes an element of vector space V, e.g., x £Y. When bi € V (i = 1, . . . , n), 
span(6| , . . . , b n ) C V (resp. span(x, , . . . , x n )) denotes the subspace generated by 
bi,...,b n (resp. x \, . . . , x n ). For bases B := (&i, . . . , bjv) and B *:=(&*,..., b* N ), 
(xi,..., x n )b := Y^iLi x i b i and (yi> • • • > Vn) b* := )Ci=i ViK- e*i and e 2 denote 
the canonical basis vectors in F^, i.e., ei := (1,0) and e 2 := (0,1). GL(n,¥ q ) 
denotes the general linear group of degree n over F g . 

2 Dual Pairing Vector Spaces by Direct Product of 
Symmetric Pairing Groups 

Definition 1. “Symmetric bilinear pairing groups” (q,G,Gx,G,e) are a tuple 
of a prime q, cyclic additive group G and multiplicative group G t of order q, 
G / 0 g G, and a polynomial-time computable nondegenerate bilinear pairing 
e:GxG4 G t i.e., e(sG,tG) = e(G,G) st and e(G, G) ^ 1. Let Gb pg be an 
algorithm that takes input 1 A and outputs a description of bilinear pairing groups 
( q , G, G t, G, e) with security parameter A. 

Definition 2. “Dual pairing vector spaces ( DPVS )” (q, Y,Gt, A, e) by a direct 
product of symmetric pairing groups (q,G,Gx,G,e) are a tuple of prime q, N- 



dimensional vector space ¥ := G x • • • x G over ¥ q , cyclic group G t of order q, 

i—l N-i 

canonical basis A := (ai, . . . , a n) ofY, where ai := (0, . 7 . , 0, G, 0, .?. , 0), and 
pairing e:VxV4 Gt- The pairing is defined by e(x, y) := n*=i e {Gi, Hi) £ G t 
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where x := (G i, . . . , Gjv) £ V andy := (Hi, . . . , Hn) £ V. This is nondegenerate 
bilinear i.e., e(sx,ty ) = e(x,y) st and if e(x,y) = 1 for all y £ V, then x = 0. 
For all i and j, e(ai,aj ) = e(G, G) Si ^ where Sij = 1 if i = j, and 0 otherwise, 
and e(G, G) ^ 1 £ G t- DPVS generation algorithm takes input 1 A (A £ N) 
and N £ N, and outputs a description o/param v := (q, V, Gt,A, e) with security 
parameter A and N -dimensional V. It can be constructed by using Gb P g- 
For the asymmetric version of DPVS, see Appendix A. 2 in 0. We describe 
random dual orthonormal basis generator Q 0 b, which is used as a subroutine in 
our IPE and ABE schemes. 

£?ob(l\ (N t )t= o,i) : param G := ( q , G, G T , G, e) 4 3 bpg (l A ), ip & F* , 
fort = 0,1, param Vt := (q,V t ,G T , A t ,e) := ^d pvs (l A , JV t , param G ), 

:= (xt,ip)id=i,..,iv t £ GL(N t ,¥ q ), 

Xf := ■'= ^ ' (A t T ) _1 , hereafter, xt,i and $ t ,i 

denote the i-th rows of X t and Xf for i = 1, . . . , N%, respectively, 
bt,i := (Xt,i) A t = ESi Xt^atj for i = 1 B* := (b tA , ...,b tt Nt ), 

b* t>i := (tf M ) At = Ejh for i = 1, N t , B* := (6* 1} 6*^), 

g T ~e(G,Gf, param := ({param Vt }t =0i i, gr), return (param,B,B*). 

We note that gr = e(bt,i- b) i ) for t = 0, 1; i = 1, . . . , AT*. Hereafter, for simplicity, 
we denote N := Ni,Y := Vi, A := Ai,B := Bi and B* := B* for variables with 
t = 1. 

3 Definitions of Generalized Inner-Product Encryption 
(IPE) and Attribute-Based Encryption (ABE) 

3.1 Generalized Inner-Product Encryption 

This section defines generalized inner product encryption (IPE) and its security. 

The parameters of generalized inner-product predicates are expressed as a 
vector x := {(t, x t ) \ t £ Is, x t £ F g } \ {0} with finite index set 1$ C N for 
encryption and a vector v := {(i, v t )\ t £ 1$, v t £ F q } \ {0} with finite index set 
Iv C N for a secret key, respectively. Here there are three types of unbounded 
IPE with respect to the decryption condition. For Type 1, R(v, x) = 1 iff I,-j C I s 
and J^teia VtXt = For T AP e 2 > T) = 1 iff Is and Yter s v t x t = 0. 

We will consider Type 0 inner-product predicate only for conventional prefix 
type vectors v := (vi, . ... v n ) and x := (xi, . . . , x n '). For Type 0, R(v, x) = 1 iff 
n = n' and v ■ x := YYt=i v t x t = 0- 

Definition 3. An inner product encryption scheme (for generalized inner- 
product relation R(v,x)) consists of probabilistic polynomial-time algorithms 
Setup, KeyGen, Enc and Dec. They are given as follows: 

Setup takes as input security parameter 1 A . It outputs public parameters pk and 
(master) secret key sk. 
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KeyGen takes as input public parameters pk, secret key sk, and vector v. It 
outputs a corresponding secret key sk#. 

Enc takes as input public parameters pk, message m in some associated message 
space, msg, and vector x. It returns ciphertext ct#. 

Dec takes as input the master public key pk, secret key sk# and ciphertext ct#. 
It outputs either m! e msg or the distinguished symbol _L. 

A generalized IPE scheme should have the following correctness property: for all 
(pk,sk) i- Setup(l A ), all vectors v and x, all secret keys sk# t— KeyGen(pk, sk, 
v), all messages m, all ciphertext ct# <— Enc(pk ,m,x), it holds that m = 
Dec(pk, sk#, ct#) if R(v,x) = 1. Otherwise, it holds with negligible probability. 

Definition 4. The model for defining the adaptively fully- attribute-hiding secu- 
rity of IPE against adversary A (under chosen plaintext attacks) is given by the 
following game: 

Setup. The challenger runs the setup algorithm, (pk, sk) <— Setup(l A ), and gives 
public parameters pk to A. 

Phase 1. A may adaptively make a polynomial number of key queries for vec- 
tors, v, to the challenger. In response, the challenger gives the corresponding 
key sk# <— KeyGen (pk, sk, v) to A. 

Challenge. A submits challenge vectors (p^°\x^^) with the same index set 
J S ( o) = /#(i) ( orn = n'^ 1 ) for Type 0) and challenge messages 
subject to the following restrictions: 

— Any key query v in Phase 1 satisfies R(v,x^ 0 ^) = i?(u, of^) = 0, or 
— Two challenge messages are equal, i.e., = m^\ and any key query 

v in Phase 1 satisfies R(v,x^ 0 ^) = R(v, i?W). 

The challenger flips a coin b {0, 1}, and gives ct^w ^ Enc(pk, to®, x^) 
to A. 

Phase 2. Phase 1 is repeated with the above restriction for key query v and 
challenge, and m^) . 

Guess. A outputs a bit b' , and wins ifb' = b. 

The advantage of A in the above game is defined as Adv|^ E,AH (A) := Pr [A wins ] — 
1/2 for any security parameter A. An IPE scheme is adaptively fully-attribute- 
hiding (AH) against chosen plaintext attacks if all probabilistic polynomial-time 
adversaries A have at most negligible advantage in the above game. For each 
run of the game, the variable s is defined as s := 0 if ^ m W for challenge 
messages and m W, and s := 1 otherwise. 

3.2 Attribute-Based Encryption with Non-monotone Access 
Structures 

Span Programs and Non-Monotone Access Structures 

Definition 5 (Span Programs 0)- Let {pi, . . . ,p n } be a set of variables. A 
span program over F g is a labeled matrix M := (M, p) where M is a (Ixr) matrix 
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over F g and p is a labeling of the rows of M by literals from {pi, . . . ,p n ,^pi , . . . , 
~'Pnj (every row is labeled by one literal), i.e., p : {1, . . . ,£} — >• {pi, . . . ,p n , ->pi, 

•••, -Pn}- 

A span program accepts or rejects an input by the following criterion. For 
every input sequence 6 € {0, 1}” define the submatrix Mg of M consisting of 
those rows whose labels are set to 1 by the input 6, i.e., either rows labeled by 
some pi such that Si = 1 or rows labeled by some ~>Pi such that Si = 0 . (i.e., 
7 : {1,...,£} — > {0,1} is defined by 7 (j) = 1 if [p(j) = pf\ A [<5» = 1] or 
[ p(j ) = -ipj] A [<5j = 0], and 7 (j) = 0 otherwise. M$ := where Mj is 

the j-th row of M.) 

The span program M accepts S if and only if 1 e span(M, 5 ), i.e., some linear 
combination of the rows of M$ gives the all one vector 1. (The row vector has 
the value 1 in each coordinate.) A span program computes a Boolean function f 
if it accepts exactly those inputs S where f(S) = 1. 

A span program is called monotone if the labels of the rows are only the positive 
literals {p\, . . . ,p n }- Monotone span programs compute monotone functions. (So, 
a span program in general is “non” -monotone.) 

We assume that no row M, : (i = 1, .... (!) of the matrix M is 0. We now introduce 
a non-monotone access structure with evaluating map 7 that is employed in the 
proposed attribute-based encryption schemes. 

Definition 6 (Access Structures). U t (t = 1, ... ,d and U t C {0,1}*} is a 
sub-universe, a set of attributes, each of which is expressed by a pair of sub- 
universe id and value of attribute, i.e., ( t , v), where t G { 1 , . . . , d} and v € ¥ q . 

We now define such an attribute to be a variable p of a span program M := 
( M,p ), i.e., p := (t,v). An access structure § is span program M := (M, p) 
along with variables p := ( t,v),p ' := (t',v'),..., i.e., § := (M, p) such that 
P : {!,•••, 4 -t {{t,v),{t',v'), . . . ,->(t,v),^(t' ,v'), . . .}. 

Let r be a set of attributes, i.e., r := {(t,xt) | 27 € F g ,l < t < d}, where 
1 <t < d means that t is an element of some subset of {1, ... ,d}. 

When r is given to access structure §, map 7 : {1, . . . ,1} — » {0, 1} for span 
program M := (M,p) is defined as follows: For i = 1 set 7 (i) = 1 if 

[p(i) = (t,Vi)]A[{t,x t ) € r]A[vi = x t \ or[p(i) = € r]A[vi £ Xt \. 

Set 7 (i) = 0 otherwise. 

Access structure § := ( M,p ) accepts r iff 1 e span((Mj) 7 ( i ) =1 ). 

We now construct a secret-sharing scheme for a non-monotone access structure 
or span program. 

Definition 7. A secret- sharing scheme for span program M := ( M,p ) is: 

1. Let M be ixr matrix. Let column vector := (/ 1 , . . . , f r ) T ^ . Then, 

so : = 1 • = Sfc=i fk is the secret to be shared, and sF := (si , . . . , s^) T := 

M ■ is the vector of l shares of the secret so and the share Si belongs to 
p{i). 
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2. If span program M := ( M,p ) accepts, or access structured := ( M,p ) accepts 
r, i.e., 1 £ span((Mj) 7 (j) = i) with 7 : {l,...,f} — > {0,1}, then there exist 
constants {a* £ F g | i £ 1} such that I C {i g {1, . . . ,£} | 7 (i) = 1} and 
J2iei a i s i = s o ■ Furthermore, these constants {a*} can be computed in time 
polynomial in the size of matrix M. 

Key- Policy Attribute-Based Encryption. In key-policy attribute-based en- 
cryption (KP-ABE), encryption (resp. a secret key) is associated with attributes 
r (resp. access structure §). Relation R for KP-ABE is defined as R(d,T) = 1 
iff access structure § accepts r. 

Definition 8 (Key-Policy Attribute-Based Encryption: KP-ABE). A 

key-policy attribute-based encryption scheme consists of probabilistic polynomial- 
time algorithms Setup, KeyGen, Enc and Dec. They are given as follows: 

Setup takes as input security parameter 1 A . It outputs public parameters pk and 
master secret key sk. 

KeyGen takes as input public parameters pk, master secret key sk, and access 
structure § := ( M,p ). It outputs a corresponding secret key sk§. 

Enc takes as input public parameters pk, message m in some associated message 
space msg, and a set of attributes, r := {(t, x t )\x t £ F g ,l < t < d}. It 
outputs a ciphertext ct p. 

Dec takes as input public parameters pk, secret key sk§ for access structure S, 
and ciphertext ctr that was encrypted under a set of attributes r. It outputs 
either m' £ msg or the distinguished symbol _L . 

A KP-ABE scheme should have the following correctness property: for all 
(pk,sk) fR Setup(l A ), all access structures §, all secret keys sk§ 4 - KeyGen(pk, 
sk, S), all messages rn. all attribute sets r. all ciphertexts ctr 4- Enc(pk, rn, T), 
it holds that m = Dec(pk, sk§, ctr) if S accepts r. Otherwise, it holds with 
negligible probability. 

Definition 9. The model for defining the adaptively payload-hiding security of 
KP-ABE under chosen plaintext attack is given by the following game: 

Setup. The challenger runs the setup algorithm, (pk, sk) <— Setup(l A ), and gives 
public parameters pk to the adversary. 

Phase 1. The adversary is allowed to adaptively issue a polynomial number of 
key queries, S, to the challenger. The challenger gives sk§ ^ KeyGen (pk, sk, S) 
to the adversary. 

Challenge. The adversary submits two messages and a set of at- 

tributes, r, provided that no § queried to the challenger in Phase 1 ac- 
cepts r. The challenger flips a coin b ^ {0,1}, and computes ct^ ^ 
Enc(pk, mS b \ r). It gives ct ^ to the adversary. 

Phase 2. Phase 1 is repeated with the restriction that no queried § accepts chal- 
lenge r. 
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Guess. The adversary outputs a guess b' of b, and wins if b' = b. 

The advantage of adversary A in the above game is defined as Advj^ p ' ABE,PH (A) := 
Pr[A wins ] — 1/2 for any security parameter A. A KP-ABE scheme is adaptively 
payload-hiding secure if all polynomial time adversaries have at most a negligible 
advantage in the above game. 

4 Proposed IPE Schemes 

4.1 Type 1 IPE Scheme 

Construction Idea for Our Type 1 and 2 IPE Schemes. In the existing 
constructions [0, 0, EMi of IPE on DPVS, around cn ( c > 1) dimensional 
vector spaces are used for n-dimensional attribute and predicate vectors. Here, 
the vectors are encoded in an n-dimensional subspace. Although this is a typ- 
ical strategy of constructing IPE on DPVS, we cannot employ this idea in the 
unbounded setting, where we can use only constant dimensional spaces. In our 
construction, each component Xt of x (resp. v t of v) is encoded in a constant 
dimensional space. In order to meet the decryption condition, we employ the 
indexing technique and n-out-of-n secret sharing trick. For example, in Type 1 
construction, 4-dimensional vector (p t (t, —1) , Svt, St) is encoded in key fe*, and 
(<7t(l,t), uxt, 2) is encoded in ciphertext ct . The first 2-dimension is used for 
indexes, and s t in the fourth component of k* is for the secret sharing. Infor- 
mally, a ciphertext can be decrypted if all n pieces of shares s t are recovered. A 
Type 2 IPE scheme can be constructed from our Type 1 scheme by setting the 
secret-sharing mechanism in the ciphertext side instead of the secret key side. 


Construction of Type 1 IPE 

Setup(l A ) : (param, (B 0 , B*), (B, B*)) 4 C? ob (l\ ( N 0 := 5, N := 15)), 
Bo := (6o,i, bo, 3, bo, 5), B := (61, .., 64, bi4, &15), 

®o := (^0,11^0,35^0,4)) ®* (b*> &4> b*2, b* 3 ), 

return pk := (1 A , param, B 0 ,B), sk:=(BQ,B*). 

KeyGen(pk, sk, v := {(f, v t ) \ t e I#}) : s t , S, rjo B g for t 6 Ij, 
so ■= E(t,<u t )ei? s ‘) fc o : = ( - s o, 0, 1, rjo, 0 ) B *, 

for t G 1$, p t , Vt, 1, Vt , 2 t— • F g , 

4 7 2 2 

k* ■= { Pt(t, -1), Sv t , s t 0 7 , Vt,i,Vt, 2 , 0 2 ) B * , 

return sk ? := (l v . fej, {k*} t ei„)- 
Enc(pk, m, x := {(t,x t ) \ t G /*•}) : u),u, C, To F g , 

Co := ( 2 , 0 , C, 0 , (po )b 0 , Ct := g?, 
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for t e Is, 0t,<Pt,u<Pt,2 ^F g , 

4 7 2 2 

c t ~( a t (l, t), cjx t , u 0 7 , 0 2 , )b, 

return ct? := (I s , c 0 , or)- 

Dec(pk, sk^r := (I#, feo, { k t }tehr), ct x ■= (Ix,c 0 ,{c t }t e i s ,c T )) : 
if I? C I s , K := e(co, feo) • rite/ff e ( c t> &* )> return tu' : = c t/K, 
else return J)U. 


[Correctness] If I $ C and Yltein VtXt ~ 

-Dso+C . r-r _ -wso-K 

9t 11 tei v 9T — 9 t 9t 


0, e(co,fcS) • n t6 /„-e(c t ,fc*) = 

9 Vt3e * s *) _ o -2s 0 +C+Cso 


Theorem 1. 77ie proposed Type 1 IPE scheme is adaptively fully- attribute- 
hiding against chosen plaintext attacks under the DLIN assumption. 

The proof of Theorem Q] is given in the full version of this paper. 


4.2 Type 0 IPE Scheme 

Construction Idea for Our Type 0 IPE Scheme. In Type 1 construction, 
4-dimensional vector (p t (t, — 1), dvt, s t ) is encoded in key fc*, and [0%{l,t),uxt, 
oj) is encoded in ciphertext c t . Here, secret-sharing system, s t for t £ 1$, in fe* are 
used to assure one of the decryption conditions, 1$ C I s . In Type 0 scheme, to 
achieve its decryption condition I # = 1$ for v := («i, . . . , v n ),x := (xj , . . . ,x n >) 
i.e., that is equivalent to n = n' , we use the above mechanism also to ciphertext 
side. Then, in our Type 0 scheme, we encode 5-dimensional (pt(£, — 1), Sv t . s t , S) 
in the first part of fe*, and (cr t (l, t),ujxt,dj, ft) in the first part of c t with random 

fj, t ,a t ,u},uj,6,6,s t ,ft «-F g . 


Construction of Type 0 IPE 

Setup(l A ) : (param,(B 0 ,BS),(B,B*)) ^a o b(l A ,(iVo := 9 ,N:= 21)), 

B 0 := ( 6 o,i) bo, 2, bo, 5, bo, 8 , bo, 9), B := (61, . . . , 65, big, . . . , ^21), 

®o : = (bop, bgp, bo i5 , . . . , b^), B* := (b*, b* 6 , . . . , b* 8 ), 
return pk := ( 1 A , param,B 0 ,B), sk := (Bq,B*). 

KeyGen(pk, sk, v := (vi, . . . ,v n )) : s t , ( 5 , <J, 770,1, 770,2 F g for i = 1 , ..., n, 
«o := E"=i s t, K := ( -s 0 , 5 , 0 2 , 1 , 770,1,770,2, 0 2 ) B *, 

for t = 1 ..... n, p t , 77 t,i, 3 F g , 

5 10 3 3 

fe* := ( m{t, - 1 ), Sv t , s t , 6 , O 10 , rj t ,i , .., 77 t , 3 , 0 3 ) B *, 

return sk# := {fejf }{=o,...,ri- 
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Enc(pk, m, £ := (xi, . . . ,x n *)) : ft, w, w, £, po,i> <Po,2 F g for t = 1, . . . , n', 
fo ■= E"= 1 ft, c 0 := ( w, -/o, 0 2 , C, 0 2 , p 0 ,i, <po,2 )b 0 , c t := g £, 

for t = 1, . . . , n', a t , <pt,i, <Pt,3 F g , 

5 10 3 3 

c t ■= ( 0-((l, t), USBt , % ft , o 10 , 0 3 , <£>4,1, -,^,3 )b, 

return ctj := ({04)4=0 ,. Ct). 

Dec(pk, sk^r := {fe*} t= o, ct* := ({04)4=0, c T )) : 

if n = n', K := f{” =0 e ( c 4, &* ), return m! := Ct/K, else return _L. 

Correctness of the scheme can be shown in a similar manner to that of our Type 
1 IPE scheme. 

Theorem 2. The proposed Type 0 IPE scheme is adaptively fully- attribute- 
hiding against chosen plaintext attacks under the DLIN assumption. 

The proof of Theorem |2| is given in the full version of this paper. 

5 Proposed KP-ABE Scheme (Basic) 

We define function p : {1, » (1, .., d} by p(i) := t if p(i) = (t, v) or p{i) = 

->(t, v), where p is given in access structure S := (M, p). In the proposed scheme, 
we assume that p is injective for S := ( M,p ) in skg. For the modified scheme 
without such a restriction, see the full version. Let d := poly(X), where poly(-) 
is a polynomial. 

Setup(l A ) : (param,(B 0 ,BS),(B,B*)) ^G o b{l X ,{N 0 := 6 ,N := 14 )), 

Bo := (60,1 ) bo, 3, bo, 5), ® := (&i, b 4, bi3, 614), 

®o := (65,!, 6^3, 60,4); ®* := (6*, 64, 6*4 , b \ 2 ), 

return pk := ( 1 A , param, B 0 , B), sk:=(Bo,B*). 

KeyGen(pk, sk, S := (M,p)) : f£ F ? r , s 0 
8 s := (si,...,sd T ■= M ■ Z 1 , rjo ^ F g , k £ := (-s 0 , 0, 1, 0) B », 

for i = 1, . . . 4^ F g , 

if p(i) = ( t,Vi ), 

4 6 2 2 

k* := ( pi(t, — 1 ), Si+0iVi, —0i 0 6 , 774 , 2 , 0 2 ) B . , 

if p(i) =-■(*, «i), 

4 6 2 2 


k* := ( Pi(t, -1), &i(m, -1), 
return sk§ := (S, {fc*}i=o,.../)- 


Vi, 1, Vi, 2, 0 2 ) B *, 
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Enc(pk, m, F := {(t, x t ) \ 1 < t < d}) : u, C, <Po F g , 
c 0 := (u>, 0, C, 0, <^o)b o; Cd + 1 := s4 TO > 
for (t, x t ) e r, a t , ip t , l, Vt , 2 ^ F g , 

4 6 2 2 

c t := ( cr t (l, t), w(l, 2 C t ), 0 6 , 0 2 , V’t.ij Vt ,2 )b, 

return ct r := {r,c 0 ,{c t }( t ,x t )er,Cd+ 1 ). 

Dec(pk, sk s := (S, ct r := (-T, c 0 , {c t }(t,x t )er,Cd+i)) : 

If § := ( M,p ) accepts T := {(t, ajj)}, then compute / and such that 

1 = otiMi, where Mi is the i-th row of M, and 
I C{iG{l, ...,£} I [/»(*) = MO A (t,Vi)er] 

v [p(i) = ->(f, Uj) A (t, x t ) er A Vi ^ x t ] }, 
K:=e(c 0 ,ko) e(c t ,fe*) ai e(ct, fc*) 0 */^-**}, 

let A p(i)=(t,Vi) mi A p(i)=^(t,Vi) 

return m! := Cd+i/K, else return _L. 

[Correctness] If § := (M, p) accepts r := { (t, a; t )}, 

K = 9t US o+<: riiet A p(i)=(t,vi) 9T a ' St rw A p(i)=^(t,vi) g^ atS ^ v% Xt "> = 

^(-so+i: 4 eJ a iSi )+C _ C 
9t — 9t- 

Theorem 3. The proposed KP-ABE scheme is adaptively payload-hiding against 
chosen plaintext attacks under the DLIN assumption. 

The proof of Theorem 0 is given in the full version of this paper. 
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Abstract. Homomorphic signatures are primitives that allow for public 
computations on authenticated data. At TCC 2012, Aim el al. defined a 
framework and security notions for such systems. For a predicate P, their 
notion of P-homomorphic signature makes it possible, given signatures 
on a message set M, to publicly derive a signature on any message m! 
such that P(M, m') = 1. Beyond unforgeability, Aim el al. considered a 
strong notion of privacy - called strong context hiding - requiring that 
derived signatures be perfectly indistinguishable from signatures newly 
generated by the signer. In this paper, we first note that the definition 
of strong context hiding may not imply unlinkability properties that can 
be expected from homomorphic signatures in certain situations. We then 
suggest other definitions of privacy and discuss the relations among them. 
Our strongest definition, called complete context hiding security, is shown 
to imply previous ones. In the case of linearly homomorphic signatures, 
we only attain a slightly weaker level of privacy which is nevertheless 
stronger than in previous realizations in the standard model. For subset 
predicates, we prove that our strongest notion of privacy is satisfiable and 
describe a completely context hiding system with constant-size public 
keys. In the standard model, this construction is the first one that allows 
signing messages of arbitrary length. The scheme builds on techniques 
that are very different from those of Aim el al. 

Keywords: Homomorphic signatures, provable security, privacy, un- 
linkability, standard model. 


1 Introduction 

With the advent of fully homomorphic encryption m , much attention has been 
paid to the problem of computing on encrypted data (see, e.g., |21|:17| I in the 
recent years. This also revived the interest of the research community in homo- 
morphic signatures, which allow for computations on authenticated data. 
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Informally, a signer has a set of messages and generates a cor- 

responding set of signatures {<7j}J : =1 with cr, = Sign(sk, m,;) for each i. The 
signed dataset {(m*, cr i )}| =1 is then archived on a remote server. Later on, the 
server can publicly compute (m,a) = Evaluate(pk, {(to,, ct*)}* = 1 , /) such that 
Verify (pk, m, a) = 1, where m = /(mi, . . . , m*,) for some function /. 

In the last decade, the area was investigated by several lines of research: ex- 
amples include homomorphic signatures for arithmetic functions [10 22 11 12 
but also redactable signatures |34ll 511 bill] and various other forms of algebraic 
signatures j33l7l26l27| . 

Recently, Ahn et al. |3j defined a framework for computing on signed data. 
For a predicate P, their notion of P-homomorphic signature allows anyone who 
observes signatures on a message m to publicly derive signatures on messages 
w! such that P(m,m') = 1. This framework is geared towards capturing ho- 
momorphic signatures supporting quoting and redacting, arithmetic functions 
and more. Ahn et al. j3j gave thorough definitions for the unforgeability of P- 
homomorphic signatures. Besides, they introduced a strong notion of privacy, 
called strong context hiding, that captures the infeasibility of linking a derived 
signature to the signature it was derived from. A scheme is said strongly context 
hiding when a derived signature is statistically indistinguishable from a freshly 
generated signature, even when the original signature is available. 

1.1 Related Work 

Homomorphic signatures were first considered by Johnson, Molnar, Song and 
Wagner 122!- Boneh, Freeman, Katz and Waters urn used them to sign vector 
spaces in order to prevent pollution attacks in network coding. They adapted 
the definitions of !H2j to the network coding setting and designed a linearly 
homomorphic scheme in the random oracle model using bilinear maps. Gennaro, 
Katz, Krawczyk and Rabin subsequently described a homomorphic signature 
| 22 | over the integers based on the RSA assumption in the random oracle model. 
Later on, Boneh and Freeman m gave a linearly homomorphic construction 
over binary fields. They also formalized a notion, called weak privacy, which 
requires derived signatures to hide the original dataset they were derived from. 

In the network coding scenario, constructions in the standard model were 
given by Attrapadung and Libert 0 and Catalano, Fiore and Warinschi |17H8j . 
Recently, Freeman j2D| defined a framework for constructing linearly homomor- 
phic signatures satisfying enhanced security properties. In the standard model, 
the framework of m notably provides constructions based on the RSA, Diffie- 
Hellman and Strong Diffie-Hellman assumptions. In the meantime, Boneh and 
Freeman m used lattices to move beyond linear functions and described homo- 
morphic signatures (in the random oracle model) supporting the evaluation of 
multivariate polynomials over signed data. 

Recently, Ahn et al. j3J realized strongly context hiding P-homomorphic signa- 
tures for quoting and subset predicates: a signed message allows deriving signa- 
tures on substrings or arbitrary subsets of that message, respectively. They also 
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showed that linearly homomorphic signatures fit I'll l~17fZn| give P-homomorphic 
signatures allowing for the computation of weighted averages and Fourier trans- 
forms on signed data. The construction of HD! was notably shown strongly con- 
text hiding thanks to its uniqueness of signatures property. 


1.2 Our Contributions 

New Definitions of Privacy. In this paper, we first reconsider the definition 
of strong context hiding security in [3J and point out a subtlety that arises in 
the context of randomizable signatures. While the definition of Ahn et al. 0 
aims at perfect indistinguishability, it only considers honestly generated original 
signatures. In specific schemes, signatures may satisfy the verification algorithm 
without being produced by the legitimate signing algorithm. Signatures [3(14123] 
derived from Waters’ dual system encryption technique 133! - which is currently 
the only known way to prove the standard unforgeability property for certain 
predicates - are typical examples. For these constructions, the definition of 0 
does not guarantee the unlinkability when the original signature is adversarially 
chosen (e.g., by re-randomizing original signatures). This may be a concern in 
certain applications. In network coding, suppose that we want to hide the path 
taken by specific packets. If a curious target node colludes with some intermedi- 
ate nodes that maliciously re-randomize signatures on the road, they may infer 
information on the rest of the path downstream. 

To address this issue, we suggest other definitions of unlinkability and discuss 
the relations among them. We first define a security property, called adaptive 
context hiding, that allows for adversarially-generated original signatures. Since 
this definition only asks for computational security, it does not imply strong 
context hiding security (2) : we show examples of schemes that are context hiding 
according to one definition and fall short of satisfying the other one. In order 
to unify these definitions, we thus define a notion of completely context hiding 
homomorphic signature, which requires statistical unlinkability and implies both 
strong and adaptive context hiding properties. 

New Linearly Homomorphic Signatures. Using the dual system tech- 
nique [3bl3()j . we describe a new linearly homomorphic signature and prove it 
(in the standard model) both strongly context hiding and context hiding on 
adversarially-chosen signatures with private key exposure. To our knowledge, 
all previous such schemes fail to simultaneously satisfy both security notions. 
The scheme of 0 is actually the only strongly context hiding realization in 
the standard model but, as we shall see, it is provably not adaptively context 
hiding. Since the new construction is only adaptively context hiding for compu- 
tationally bounded distinguishers, it does not meet our strongest definition. This 
shortcoming seems inherent to all signature schemes [4123] based on the dual sys- 
tem paradigm. We leave it as an open problem to achieve information-theoretic 
unlinkability in that sense without resorting to the random oracle model. 
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If we settle for weak context hiding securitjQ (as in most linearly homomor- 
phic signatures |HI2()j b a variant of our scheme provides the shortest linearly 
homomorphic signature based on a simple assumption in the standard model. At 
the expense of being context hiding in a weaker sense than m, the scheme can 
be proved unforgeable under the standard computational Diffie-Hellman (CDH) 
assumption. Each signature consists of two group elements and one scalar, which 
shortens Freeman’s CDH-based signatures j2D| by about 25%. 

Handling Subset Predicates for Messages of Arbitrary Length. Fi- 
nally, the paper puts forward a new method for dealing with subset predicates. 
Ahn et al. j3j showed how to obtain such signatures from a certain class of 
ciphertext-policy attribute-based encryption (CP-ABE) systems, by applying 
a Naor-like transformation |0j. With currently available fully secure CP-ABE 
schemes |2bl35j , this technique is limited to support messages of bounded length: 
the maximal length n max of original messages must be fixed at key generation 
time and public keys comprise at least 0(n max ) group elements. This limitation 
could be avoided using a fully secure unbounded CP-ABE scheme. However, 
no such system is currently available: the only known |3 1 !2iSj unbounded ABE 
constructions to date are selectively secure key-policy ABE schemes. 

To fill this gap, we suggest an alternative design principle which yields 
constant-size public keys and allows signing messages of arbitrary length. Our 
construction departs from the ABE-based approach of 0 and rather uses the 
randomizability properties of Groth-Sahai proofs [221 ■ In a nutshell, when origi- 
nal signatures are computed for a set of words {mi, . . . , m„}, the signer generates 
a fresh public key pk', which is certified using the long-term secret key of the 
system, and uses sk' to compute cr, = Sign(sfc', m*) for each i. This construction 
is made unlinkable by letting pk' and all signatures {cti}” =1 appear in committed 
form, accompanied with non-interactive witness indistinguishable proofs of their 
validity. The general idea is instantiated by combining the structure-preserving 
signature of 0 with Waters signatures |3B| - which are both partially random- 
izable - in such a way that we only need to manipulate linear pairing product 
equations (in the terminology of E3)- This makes it easy to re-randomize Groth- 
Sahai proofs when deriving signatures. As a result, the system provably satisfies 
our strongest definition of unlinkability. 

We believe this approach to be of interest in its own right for the design of P- 
homomorphic signatures. Indeed, if we compare it with the dual system technique 
it allows us to more easily obtain completely context hiding schemes. 


1.3 Organization 

We first review previous security definitions for P-homomorphic signatures and 
introduce new definitions of privacy in Section 12. H Section E3 discusses the rela- 
tions among these privacy definitions. In Section 01 we describe a new linearly 

1 This property relaxes strong context hiding security by only requiring the indistin- 
guishability when the original signatures are not given. 
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homomorphic constructions, for which a CDH-based weakly context- hiding vari- 
ant is described in the full version of the paper. Section 0 finally presents our 
completely context hiding system for subset predicates. 

2 Background 

2.1 Definitions for Homomorphic Signatures 

Definition 1 (0). Let M be a message space and 2 M be its powerset. Let 
P : 2 m x M. — >• {0, 1} be a predicate. A message m! is said derivable from 
M c M if = 1. As in P l 2 {M) is the set of messages derivable 

from P i ~ 1 {M), where P°{M) := {m' £ M \ P(M,m') = 1}. Finally, P*(M) := 
U“ 0 P*(M) denotes the set of messages derivable from M by iterated derivation. 

Definition 2 m)-AP -homomorphic signature for a predicate P : 2 M x M. — > 
{0,1} is a triple of algorithms (Keygen, Sign Derive, Verify) such that: 

Keygen (A): takes as input a security parameter A £ N and outputs a key pair 
(sk, pk). As in Jffl, the private key sk is seen as a signature on the empty 
tuple e £ M . 

SignDerive(pk, ({a m }meM, M), m'): is a possibly randomized algorithm that 
takes as input a public key pk, a set of messages Me At, a corresponding set 
of signatures {cr m } m£ M and a derived message m' £ M. If P(M,m') = 0, 
it returns _L. Otherwise, it outputs a derived signature a 1 
Verify(pk, cr, m): is a deterministic algorithm that takes as input a public key pk, 
a signature a and a message m. It outputs 0 or 1. 

Note that the empty tuple e £ M. satisfies P(e, m) = 1 for each m £ M.. Like [3j, 
we define the algorithm Sign(pk, sk, m) that runs SignDerive(pk, (sk, e), m) and 
returns the resulting output. For any set M = {mi, . . . , rrife} C A4, we define 
Sign(sk, M) := {Sign(sk, mi), ... , Sign(sk, m*,)} . Also, Verify(pk, M, {o- to } toG m) = 
1 means that Verify(pk, m, a m ) = 1 for each m £ M. 

Correctness. It is mandated that, for all pairs (pk, sk) <— Keygen(A), for any 
set M c M, any message m' £ M such that P(M, m') = 1, then, we have 

- SignDerive(pk, (Sign(sk, M), M), m') yCL. 

- Verify(pk, m! , SignDerive(pk, (Sign(sk, M), M), m')\ = 1. 

Definition 3 (0). A P -homomorphic signature (Keygen, SignDerive, Verify) is 
said unforgeable if no probabilistic polynomial-time (PPT) adversary has non- 
negligible advantage in this game: 

1. The challenger generates (pk,sk) <— Keygen(A) and gives pk to the adversary 
A. It initializes two initially empty tables T and Q. 

2. A adaptively interleaves the following queries. 

- Signing queries: A chooses a message m £ M. The challenger replies 
by choosing a handle h, runs a <— Sign(sfc,m) and stores ( h,m,a ) in a 
table T. The handle h is returned to A. 
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- Derivation queries: A chooses a vector of handles h = (hi,..., hk) and 
a message m! £ M.. The challenger retrieves the tuples {(hi,mi,ai)}\ =1 
from T and returns _L if one of these does not exist. Otherwise, it defines 
M := (mi,...,TOk) and {cr m } m£M = {cti, . . . , cr fe }. If P(M,m') = 1, 
the challenger runs a' SignDerive(pk, ({a m }mEM,M),m'), chooses a 
handle h! , stores ( h',m',a ') in T and returns h' to A. 

- Reveal queries: A chooses a handle h. If no tuple of the form (h, m’, a') 
exists in T, the challenger returns _L . Otherwise, it returns a' to A and 
adds ( m',a ') to the set Q. 

3. A outputs a pair ( a',m ') and wins if the following conditions hold. 

- Verify(pk = 1. 

- If M c M is the set of messages in Q, then to' ^ P*(M). 

Definition 4 (0). A homomorphic signature (Keygen, Sign, SignDerive, Verify) 
is strongly context hiding for the predicate P if, for all key pairs (pk, sk) •<— 
Keygen(A), for all messages M C M* and m! £ M. such that P(M, m!) = 1, the 
following two distributions are statistically close: 

{(sk ,{a m } meM <- Sign(sk, M), Sign (sk,m'))} sKM rn , , 

{ (sk, «- Sign(sk, M), SignDerive(pk, ({<r m } meM , M), m')) } sk Mm , ■ 

In j2j Ahn et al. showed that, if a scheme is strongly context hiding, then Defini- 
tion El can be simplified by removing the SignDerive and Reveal oracles and only 
providing the adversary with an ordinary signing oracle. 

As we will see, specific constructions leave a gap between signatures accepted 
by the verification algorithm and those generated by the original signing proce- 
dure. For these schemes, a stronger definition than Definition 0 may be necessary 
in some situations. 

To illustrate this, we first give an alternative definition which is almost iden- 
tical to the computational security definition of 0 [Appendix A] : the only dif- 
ference is that, in the challenge phase, one of the signatures is supplied by the 
adversary instead of being honestly generated by the challenger. This modifica- 
tion is motivated by re-randomizable signatures. It allows for adversaries who 
attempt to re-randomize one of the signatures obtained from the oracle in order 
to embed some subliminal information that would help them win the game. 

Definition 5. A P -homomorphic signature (Keygen, Sign, SignDerive, Verify) is 
weakly adaptively context hiding if no PPT adversary has non-negligible 
advantage in the following game: 

1. The challenger runs (sk, pk) <— Keygen(A) and gives pk to the adversary. 

2. The adversary A adaptively interleaves queries exactly as in Definition 0 

3. The adversary A chooses a message set M c M together with a set of 
signatures {cr m } me M as well as another message m' G M. If P(M,m') = 0 
or Verify(pk, M, (cr m } me M) = 0, return T. Otherwise, the challenger flips a 
fair binary coin /3 •£- {0, 1}. If fl = 0, it computes a derived signature a* = 
SignDerive(pk, ({cr m }meM, M), m') . Iff) = l, it computes a* = Sign(sfc,m'). 
In either case, a* is sent as a challenge to A. 
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4- A is allowed to make another series of queries as in stage 2. 

5. Eventually, A outputs a bit ft' G {0,1} and wins if ft' = ft. As usual, A’s 
advantage is defined to be Adv(A) = |Pr[/3' = /3] — 1/2|. 

The latter definition can be seen as an analogue of a definition of unlinkability 
given by Prabhakaran and Rosulek m for homomorphic encryption: both mod- 
els account for adversarially-chosen original signatures or ciphertexts. 

We will see that Definitions El and El do not imply each other. While incom- 
parable, we believe that they both make sense in practice. For example, when 
it comes to conceal the path followed by packets in network coding signatures, 
Definition El ensures that each node only learns the last node visited by incoming 
packets, even if it colludes with another node far upstream. 

Towards unifying previous definitions, we now simplify Definition El as follows. 
Instead of providing the adversary A with a signing oracle, A is directly given 
the private key at the beginning. 

Definition 6. A P -homomorphic signature is adaptively context hiding if 

no PPT adversary has non-negligible advantage in the following game: 

1. The challenger runs (sk, pk) <— Keygen(A) and hands (sk, pk) to A. 

2. The adversary A chooses a message set M C A4 together with a set of 
signatures {cr m }meM as well as another message rn' G A4. If P(M, rri') = 0 
or Verify(pk, M, {cr m } me jvr) = 0, return _L. Otherwise, the challenger flips a 
fair binary coin /3 •£- {0, 1}. If ft = 0, it computes a derived signature a* = 
SignDerive(pk, ({o- m } me jvr, M), m') . If /3 = 1, it computes a* = Sign(sfc,m'). 
In either case, a* is sent as a challenge to A. 

3. Eventually, A outputs a bit ft' G {0,1} and wins if ft' = fi. As usual, A’s 
advantage is defined to be Adv(A) = |Pr[/3' = (3] — 1/2|. 

While the latter definition seems sufficient for many applications, it still does not 
imply Definition 0 and we may want signatures to be unlinkable in the statistical 
sense. The resulting stronger definition implies both Definition El and Definition 
Eland goes as follows. 

Definition 7. A P -homomorphic signature (Keygen, Sign, SignDerive, Verify) is 
completely context hiding if. for all pairs (pk, sk) <— Keygen(A), all mes- 
sages M C M* and m! G M such that P(M,m') = 1, for all {a m }meM such 
that Verify(pk, M, {a m }meM) = 1, the distribution {(sk, Sign (sk,m'))} sk M m , is 
statistically close to {(sk, SignDerive(pk, ({cr m } me M, Af), m'))} sfe M . 

In all schemes based on the dual system approach |4l‘23j . the existence of an 
alternative distribution of acceptable signatures makes it seemingly impossible 
to satisfy the above definition. In these schemes, the combination of strong ( i.e ., 
Definition 0) and adaptive context hiding security thus appears as the best we 
can hope for. For this reason, we chose to present Definition El first instead of 
directly working with Definition 0 

Definition 0 assumes honestly generated keys (sk, pk). It can be strengthened 
by allowing the adversary to generate a pair (sk, pk) of its own. In the random 
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oracle model, the construction of HH is easily seen to satisfy such a stronger 
definition (if we assume that all public keys live in a cyclic group which is part 
of common public parameters) because it has unique signatures. In the standard 
model, we do not know of any scheme that would be secure in that sense. 

In the following, we can satisfy Definition Q with our homomorphic signature 
for subset predicates. In the case of linearly homomorphic signatures, we are 
only able to meet Definition El 


2.2 Complexity Assumptions 

We consider groups (G, G t) of composite order N = pip 2 p 3 , for which a bilinear 
map e : G X G — >• G t is computable. For each i £ {1,2,3}, we denote by G Pi 
the subgroup of order p,. Also, for all distinct i,j, we call G PiP . the subgroup of 
order piPj . An important property of composite order groups is that pairing two 
elements of order p, and p 3 , with i ^ j. always gives the identity element le r . 
In these groups, we rely on the following assumptions introduced in m- 

Assumption 1. Given g A G Pl , X 3 ■£- G P3 , and T, it is infeasible to efficiently 
decide if T e R G PlP2 or T £ R G Pl . 

Assumption 2. Let g,X\ G Pl . X 2 , F 2 G P2 ,Y 3 ,Z 3 G P3 . Given a tuple 
(g, Xi X 2 . Z 3 , Y 2 Y 3 ) and T, it is hard to decide if T £ fl G or T e R G PlP3 . 
Assumption 3. Let elements g,w,g t ,Xi •£- G Pl with t Zjv, X 2 , Y 2 . Z 2 £- 
G P2 ,X 3 ,Y 3 ,Z 3 A G P3 . Given (g,w,g t ,X 1 X 2 ,X 3 ,Y 2 Y 3 ), and Te G, decide 
if T = w t Z 3 or T = w t Z 2 Z 3 . 

Assumption 4. Let g G Pl , X 2 , Y 2 , Z 2 G P2 , X 3 G P3 and a, b, c Tijy. 
Given (g,g a ,g b ,g ab X 2 ,X 3 ,g c Y 2 ,Z 2 ), it is infeasible to compute e(g,g) abc . 

We also use bilinear maps e : G X G — )■ G t over groups of prime order p. In these 
groups, we rely on the following hardness assumptions. 

Definition 8 (0). The Decision Linear Problem (DLIN) in G, is to dis- 
tinguish the distributions (g a , g b , g ac , g bd , g c+d ) and {g a , g b , g ac , g bd , g z ) , where 
a,b,c,d •£- hp, z ■£- Ti*. The Decision Linear Assumption is the intractabil- 
ity of DLIN for any PPT distinguisher D. 

Definition 9 (HI)- In a group G, the g- Simultaneous Flexible Pairing 
Problem (q-SFP) is, given ( g z , h z , g r , h r , a, a, b, b £ G) and q tuples 
( Zj , rj , Sj , tj , Uj ,Vj,Wj ) £ G 7 such that 

e (a, a) = e{g z , Zj) ■ e(g r , rj) ■ e(sj,tj), e(b, b) = e(h z , Zj) ■ e(h r , uj) ■ e(vj, Wj), 

(1) 

to find a new tuple (z* ,r* , s* ,t* ,u* ,v* ,w*) £ G 7 satisfying m and such that 

Z* <£ {1 G ,Zi,...,Z q }. 
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2.3 Structure-Preserving Signatures 

Privacy-preserving protocols often require to sign elements of bilinear groups 
as if they were ordinary messages. Abe, Haralambiev and Ohkubo HE! (AHO) 
described such an efficient structure-preserving signature. The description here- 
under assumes public parameters pp = ((G, G t), g) consisting of bilinear groups 
(G, G t) of prime order p > 2 A , where A G N and a generator g G G. 

Keygen(pp, n): given an upper bound n G N on the number of group elements 
per signed message, choose generators G r ,H r G. Pick 7 Z ,6 Z A Z p and 
ji,Si A z p , for i = 1 to n. Then, compute G z = GJ Z , H z = H^ z and 
Gi = G7S Hi = for each i G {1, . . . , n}. Finally, choose a a , cp, A Z p and 
define A = e(G r ,g aa ) and B = e(H r , g ah ) . The public key is defined to be 

pk= (G r , H r , G z , H z , {Gi,Hi}? =1 , A, B ) G G 2ri+4 x G| 

while the private key is sk = {01 a , ccb, j z , S z , {7*, (5,;}” =1 ) . 

Sign(s/c, (Mi, . . . , M„)): to sign a vector (Mi, . . . , M„) G G" using sk, choose 
p a , Pb,uj a ,ujb A lip and compute 9\ = g *= as well as 

Q 2 = g p*-y*t. 0 3 = G?“, 04 = 

0 5 = gP»S*C . M -&i, 0 6 = 0 ? = 


The signature consists of a = {6\ , 62, 63, 64, 65, 9q, 67) G G 7 . 

Verify ( pfc , a, (Mi, . . . ,M„)): given a = {9\,9 2 , 03, O4, 63, 9§, 67), return 1 iff these 
equalities hold: 

A = e(G z , 6 1) • e(G r , 0 2 ) • e{0 3 , 0 4 ) • f[ e ( G i> M 0, 

B = e(H z , 6»i) • e(H r , 9 5 ) • e(<9 6 , 0 r ) ■ e{H u Mi). 

The scheme was proved \ 1 12\ existentially unforgeable under chosen-message at- 
tacks under the 0-SFP assumption, where q is the number of signing queries. 

As showed in DEI, signature components {0j} 7 = 2 can be publicly randomized 
to obtain a different signature {0(}J_i <— ReRand(pfc, a) on (Mi, . . . , M„). After 
randomization, we have 9\ = 0-\ while {d'} 7 =2 are uniformly distributed among 
the values (0 2 , ■ ■ ■ , O7) such that the equalities e(G r , 0 2 ) ■ e(0' 3 , 0’ A ) = e(G r , 0 2 ) ■ 
e(03, 6* 4 ) and e{H r , 0' 5 )-e(0' 6 , 0' 7 ) = e(H r , 0 5 )-e(0e, 67) hold. This re-randomization 
is performed by choosing £>2, 05, P, v A Z p and computing 

0'=0 2 .0f, 0' 3 = (0 3 -G” 02 ) 1 ^, 0' 4 = 91 (2) 

0 ' = 0 5 . 0 | 5 , 0' 6 = ( 0 6 . H~ ah f lv , 0' 7 = 0". 
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As a result, {^}ie{3,4,6,7} are statistically independent of the message and other 
signature components. This implies that, in privacy-preserving protocols, re- 
randomized {^}ie{3,4,6,7} can be safely given in the clear as long as (Mi, . . . , M n ) 
and 1,2,5} are given in committed form. 


3 Separation Results 

Separating Definitions 0 and 0 Let us consider the following varianiQ of the 
construction in gj , which relies on the Lewko- Waters signatures m and bilinear 
groups whose order is a product N = P1P2P3 of three primes. If n denotes the 
dimension of signed vectors, the public key is pk = (g, e(g, g) a , u, v, {/(,;} ” =1 , X 3 ) , 
where a Gr %n, 9 , u, v, hi, . . . , h n G G Pl , X 3 G G P3 and the private key consists 
of sk = (g a ,K), where k is the seed of a pseudorandom function. The latter is 
used to de-randomize the scheme and make sure that all vectors of the same file 
will be signed using partially identical random coins. 

To sign a vector v = (vi , . . . , v n ) G %% using the file identifier r, the signer 
computes a pseudorandom r = G Zr which is used to compute 

(ffi, 02, o- 3 ) = ( g a • ( u T ■ v) r ■ R 3 , g r ■ R' 3 , (JJ h^f ■ R^j , 


with J?3 , iJg , A3 A G P3 . The homomorphic property follows from the fact that 
all vectors of the same dataset are signed using the same r G Zjy. The homomor- 
phic evaluation algorithm proceeds in the obvious way and combines signatures 
{(cr,;.i , <7^2, Oi, 3)}f =1 by linearly combining the {0^3}^ and re-randomizing the 
G P3 components. Note that the underlying exponent r is not re-randomized, so 
that all share the same G Pl components. 

It is easy to see that the construction is strongly context hiding in the sense of 
Definitional Indeed, the signing algorithm is honestly run in the first distribution 
of Definitional This implies that, for any message set M = {(r, fii), . . . , (r, Vk)} C 
M., the underlying log ff (<T2) will have the same value no matter if the second sig- 
nature (or, 02) is produced by Sign or SignDerive. 

However, the scheme does not satisfy Definition 0 Indeed, in step 2, the ad- 
versary can first invoke the signing oracle on k occasions to obtain signatures for 
some set M = {(r, vi), . . . , (r, v^)} of its choice. If we denote by {cr m }meM the 
resulting signatures, the adversary re-randomizes {a m }meM in such a way that 
each randomized a m is of the form [g a ■ (u T ■ v) r ■ R;i , g r ■ R' :i , dllLi Y ' ^3) > 
for some fresh r' Gr Zr. The adversary A can then choose a random message 
vn! G M. such that P(M,m') = 1 and send ((M, m') to the chal- 
lenger. The latter returns a challenge signature a* = (<r*, 3) on m! and A can 

immediately figure out if a* is fresh or derived, by testing if e(cr2i 9 ) = e(<r m ,2, <?)• 
With overwhelming probability, the latter equality only holds if /3 = 0. 

2 This variant is obtained by applying Freeman’s framework (213 to Lewko- Waters 
signatures eh, which guarantees its unforgeability. 


Computing on Authenticated Data: New Privacy Definitions 


377 


Separating Definitions El and El The original construction of 0 works ex- 
actly like the scheme outlined in the previous paragraph with the difference 
that it prevents public randomizations of the G Pl components of signatures 
(pi, 02, 03). More precisely, the scheme makes use of an additional collision- 
resistant hash function H : {0,1}* — > Zjv- If the file identifier is r, a vector 
v = (vi, . . . ,v n ) is signed by computing r = G Z n , t' = H(T,e(g,g) r ) 

and returning 

(01, 02, 03) = ( g a ■ (u T ‘ ■ v) r ■ J? 3 , g r ■ J?3, ( hf) r ■ R'^j , 

with R 3 ,R' 3 ,R 3 G P3 . The security proof of 0 implies that, if the adversary 
is given signatures {(01,1, 0^,2, 0*,3)}f=i on messages (t, wi ( t, vt), the ad- 
versary cannot generate a signature (01, 02, 03) on (t, y) such that e(a2,g) 7^ 
e ( a i,2,g) for each i. Essentially, since , 01.2) can be seen as a Lewko- Waters 
signature on the message H (t, e(g. g) r ) , any valid signature (or , 02 , 03 ) for which 
e ( a i,2,g) 7^ e(u2 ,g) implies either an attack against the signature scheme of pH) 
or a breach in the collision-resistance of H. 

Let us consider an adversary in the sense of Definition El Since signatures 
cannot be publicly randomized, when the adversary enters the challenge phase 
in step 3, it can only choose a message set M = {(r,ui (r,« 5»)} and signa- 
tures {(oYre,i, 0Vn,2, 0Vra,3)}meivr for which {e(cr mi 2, g)}meM has the same value as 
in signatures obtained from the signing oracle at step 2. Therefore, the only way 
for A to have non-negligible advantage in the game of Definition El is to choose 
(M, {<T rn }rneM) where {0 m }meM is obtained by introducing a G P2 component in 
a signature obtained from the signing oracle. Otherwise, the distribution of the 
challenge signature (<r{, cr£, U3) does not depend on fi G {0, 1} in step 3. Using 
the same arguments as in the proof of Theorem^ we can prove that Assumption 
1 can be broken if A can output a set {a m } me M where one of the signatures 
contains a G P2 component. If H is collision-resistant and under the assumptions 
used in 0 , the scheme is thus weakly adaptively context hiding. 

Now, we easily observe that the original scheme of 0 is not adaptively context 
hiding in the sense of Definition El Recall that the adversary is given the private 
key sk = ( g a , k) at the beginning of the game. In the challenge phase, it can thus 
choose a message set M c M. and signatures {<r m } m£ « for which each a rn is of 
the form (0" m ,i, 0"m,2) 0^,3) = (g a -{u T •u)’" -R 3 , g r -R 3 , • R ' 3 ) , with 

R 3 , R3 , Rg Gft G P3 , and for some random r' Gr 'L n \{'P(k, t)}. When receiving 
(M, {(Tr„,}meM) and nn! such that P(M,m’) = 1, the challenger runs SignDerive 
on {cr m }meM if P = 0. If 0 = 1, it ignores (ff r „}meM and simply generates a 
fresh signature on m! . In the letter case, the challenge signature (o}, o-}, 03) is 
such that log s (02) = t) mod pi and, since the adversary knows k, it can 

easily test whether = e (<?, g)' Ir ^ K ’ T ' > and, if so, return /3’ = 1. 

Later on, we will see an example of scheme that satisfies Definition El but fails 
to be secure as per Definition 0 The two definitions are thus incomparable. 
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4 An Adaptively Context Hiding Linearly Homomorphic 
Scheme in the Standard Model 

So far, the scheme of 0 is seemingly the only linearly homomorphic signature 
in the standard model to satisfy Definition 0J This section presents a linearly 
homomorphic signature satisfying both Definition 0 and the adaptive context 
hiding property captured by Definition 0 

The scheme works over groups whose order is a product N = P 1 P 2 P 3 of three 
primes. Like 0, it builds on Lewko- Waters signatures, where public keys contain 
(g, e(g, g) a , u, v ) , with g,u,v £ G Pl and a £ Zjv, and a signature on m consists 
of ( g a ■ (u rn ■ v) r ■ R 3 , g r ■ R' 3 ) , for some R 3 , R '3 £ G P3 . A difference with 0 is 
that e(g,g) a is replaced by g a in the public key and signatures are obtained 
by aggregating a Lewko- Waters signature on the file identifier r and a signed 
vector hash (n"=i 9 T) a °f the vector v = (vi, . . . , v n ), where (g %, ... , g n ) £ G pi 
is part of the public key. We note that (n"=i ( jT)° not a secure homomorphic 
signature in general: it can actually be seen as a one-time linearly homomorphic 
signature where only one message set M = {(t, v-y ). . . . , (r, Vk)} can be signed. 
Nevertheless, we will show that aggregating the two components actually pro- 
vides unforgeability. Moreover, beyond providing a stronger flavor of privacy 
than 0, it also shortens signatures by 33%. 

For simplicity, the scheme is described in terms of composite order groups. It 
is very plausible that Lewko’s techniques m apply to translate the scheme in 
the prime order setting. 

4.1 Construction 

Keygen(A, n): given A 6 N and an integer n £ poly(A), choose bilinear groups 
(G,Gt) of order N = P 1 P 2 P 3 , where pi > 2 X for each i £ {1,2,3}. Choose 
a A Zjv, g,u,v ■£- G Pl , X P3 •£- G P3 , gi •£- G Pl for i = 1 to n. Then, select 
an identifier space T- The private key is sk := a while the public key is 

P k := ((G,G t ), N, g , g a , u, v, {<?i}i= X P3 ). 

Sign(sk, r, v): on input of a vector v = (vi , , v n ) £ Z^, a file identifier r £T 
and the private key sk = a £ Zjv, return T i Q v — 0 . Otherwise, conduct the 
following steps. First, choose r A Zjv and R 3 , R ' 3 G P3 . Then, compute a 
signature a = (cri,<72) as 

0-1 = ( gl 1 ■ ■ ■ g v n) a ■ ( U T ■ v) r ■ R 3 , a 2 = g r ■ R' 3 , 
SignDerive(pk, r, { (/3j , cr^ ) }{ =1 ) : given pk, a file identifier r and l tuples (ft, ctj), 
parse uj as Uj = 2 ) for i = 1 to l. Then, choose r ■£- Z N , R 3 , R' 3 ■£- Z N 

and compute g\ = nf=i a i 1 ‘ i uT * V Y ' R 3 and 02 = Ilf=i 2 • ( J r • R '3 and 
Output («7 i,(J2). 

3 In the construction, we disallow signatures on the all-zeroes vector 0. This is not 
a restriction since, in all applications of linearly homomorphic signatures, a unit 
vector (0, . . . , 1, . . . , 0) of appropriate length is appended to signed vectors. 
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Verify(pk, t, y, a): given a public key pk, a signature a = (cri , cr 2 ) and a message 
(r, y), where t £ Zjv and y = (yi, ■ ■ ■ ,y n ) € (Zj v) n , return _L if y = 0. 
Otherwise, return 1 if and only if e(ui, g) = e(g\ 1 ■ ■ ■ glffig 01 ) ■ e(u T ■ v, 02 ). 

Verifying the correctness of the scheme is straightforward since pairing an ele- 
ment of G Pl with an element of G P3 always gives the identity element in Gt- 


4.2 Security 

Theorem 1 . The scheme is adaptively context hiding if Assumption 1 holds. 
(The proof is given in the full version of the paper) . 

As already mentioned, computational adaptive context hiding security does not 
imply statistical strong context hiding security (cf. Definition 0J) in general. Let 
us consider a simple modification of the scheme. The public key includes e(g, g) v , 
for some ip €_ r Zjv which is not part of sk. Original signatures are augmented with 
(73 = e(g, g) v ' r , which is ignored by the verification algorithm. Also, SignDerive 
replaces (73 by a random element of Gt- Although this artificial scheme can be 
proved adaptively context hiding under Assumptions 1 and 4, it does not meet 
the requirements of Definition 01 

Yet, it is immediate that the system of Section IPI is also secure in the sense 
of Definition 0| 

Theorem 2. The scheme is unforgeable assuming that Assumptions 1, 2, 3 and 

4 hold. (The proof is given in the full version of the paper) . 

In the full version of the paper, we show that the same scheme can be safely 
instantiated in prime order groups if we settle for the weaker privacy definition 
used in [1 111212 dj . The unforgeability of this modified scheme can be proved un- 
der the standard DifRe-Hellman assumption. To date, this construction turns out 
to be the shortest linearly homomorphic signature based on a simple assumption. 

5 A Construction with Short Keys for Subset Predicates 

In this section, we use the malleability properties of Groth-Sahai proofs (al- 
ready exploited in, e.g., 1(11211191 1 to construct a homomorphic signature for 
subset predicates. The main advantage over the approach of 0 is that we ob- 
tain constant-siz^| public keys in the standard model. In the standard model, 
the CP- ABE approach of 0 is currently limited to provide linear-size public 
keys in the maximal length of signed messages. 

This limitation could be avoided using a ciphertext-policy adaption of the 
unbounded key-policy ABE system of m However, the ABE construction of 
m is only known to be selectively secure and, for the time being, no fully 
secure unbounded CP-ABE system is available. Conceivably, such a scheme can 

4 By “constant” , we mean that it only depends on the security parameter and not on 
the length of messages to be signed. 
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be obtained by extending the techniques of m Still, the resulting system would 
probably encounter the same difficulties as in Sectional when it comes to obtain 
complete context hiding security. In contrast, our scheme is proved completely 
context hiding and fully (as opposed to selective-message) secure. It also allows 
for messages of unbounded (but polynomial) length. 

In homomorphic signatures for subset predicates, the message space M. can 
be defined as the set of tuples M := £*, where £ is a set of words. The predicate 
P is defined in such a way that, for any polynomials {n,;},; and n ! , we have 

P^{mi, . . . ,m n } ,{ 7711 , • • • = 1 

•$=>■ {n' < n) A ( m'j G {7711, . . . , m n } for j = 1 to n'). 

The intuition of the scheme begins with the following naive construction, based 
on any digital signature, that only works when privacy is not a concern. The 
public key of the scheme is a standard digital signature key pair ( sk,pk ). When a 
message Msg = {mi, . . . , m„ } must be signed, the signer generates a fresh public 
key ( sk',pk '), certifies pk’ by computing a v y <— Sign(sfc, pk') and returning 
ipk', <J p k ' , {cr,; = Sign(,sfc', mj)}” =1 ). This simple construction immediately allows 
signature derivations for subset predicates. Moreover, since each signed set of 
words Msg involves a different public key pk', there is no way to generate a 
signature on a message Msg* that mixes words from two distinct signed messages 
Msg!, Msg 2 . However, the latter construction is trivially not context hiding. To 
achieve the latter property, instead of leaving pk' and { cr,;}”{ =1 appear in the clear 
within signatures, we let them appear in committed form and appeal to non- 
interactive witness indistinguishable (NIWI) arguments of knowledge of these 
signatures and keys. Then, the randomizability properties of Groth-Sahai proofs 
come in handy to obtain the desired privacy properties. 

To realize the above idea, we work with Waters signatures m and the 
structure-preserving signature of Abe et al. |H2j because they make it possi- 
ble to work with linear pairing product equations. As observed in EH. these 
equations have proofs that only depend on the randomness of Groth-Sahai com- 
mitments and not on the committed witnesses or on the right-hand-side member 
of the equation. In the Sign Derive algorithm, this allows updating some of the 
witnesses in such a way that the old proof remains valid. 

In the following notations, we define a coordinate-wise pairing E : G x G 3 — > 
Gf, such that, for any element h G G and any vector g = (51, 52, 93), we have 
E(h,g) = (e(h,gi),e(h,g2),e(h, 53)). In the following, when X G G (resp. 
Y G G x), the notation lq(X) (resp. m T {Y)) will be used to denote the vec- 
tor ( 1 g, 1 g ,^0 € G 3 (resp. the vector ( 1 g t , 1 g t >^0 e ^r)- 

Keygen(A): given a security parameter A G N, choose bilinear groups (G,Gt) 
of prime order p> 2 X . Then, do the following. 

1. Generate a Groth-Sahai CRS f = (/i,/2,/s) for the perfect witness in- 
distinguishability setting. Namely, choose fi = (/i,l,<?), = (l,f2,g), 

and /3 = fi ■ f2 ■ ( 1 , l,ff) -1 , with /1, / 2 A G, £1,^2 % P - 
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2 . Generate a key pair (sA’aho, M'aho) for the AHO signature in order to 
sign messages consisting of a single group element. This key pair are 

pA ; AH0 = (G r , H r , G z = G?*, H z = H s r % G x = GJ l , H x = H ? 1 , A, B~j 

and sA:aho = {aa,&b,'Y z ,<> z , 7 i,()i). 

3. Generate parameters for the Waters signature. Namely, choose group 
elements /lAG, and (uo,ui, . . . ,ul ) A G L+1 . These are used to im- 
plement a hash function H G : {0, l } 7 — > G such that, for any string 
m = m[l] . . . m[Tj € {0, 1} L , H G {m) = u 0 ■ {['; , uf*. 

The public key is defined to be pk := ^(G,Gt),< 7 , f, pA;aho> h, {ui}f =0 ^ 
and the private key is sk = sA'aho- The public key defines E = { 0 , 1 } L . 

Sign (sk, Msg): on input of a message Msg = {rrii}™ =l , where m; £ {0, 1} L for 
each i, and the private key sk = .S'A'aho, do the following. 

1. Choose a new public key X = g x for Waters signatures, with i AZ P . 

Generate a Groth-Sahai commitment Cx = iq(X) ■ f\ ■ f 2 ■ f 3 , 

with rx,sx,tx A Z p . 

2 . Generate an AHO signature £ G 7 on the group element 

X £ G. Then, for each j £ {1,2,5}, generate Groth-Sahai commit- 
ments C'g . = lq ( 0j) ■ fi 3 ■ f 2 3 ■ f 3 3 ■ Finally, generate NIWI proofs 

^aho. 1 )^aho ,2 € G 3 that committed variables (X, 6 fi, 02 , 0 s) satisfy 

A ■ e( 03 , 0 4 ) _1 = e(G z , 0i) ■ e(G r , 0 2 ) ■ e{G\,X) (3) 

B • e(0 6 , er)- 1 = e(H z , 0 t ) ■ e(H r , 0 5 ) ■ e(H x ,X) 

These proofs are obtained as 

ttaho .1 = ( G~ rei Gr rs2 Gi rx , G z S9l Gr S02 Gi sx , G z tei G~ t02 Gi tx ) 
ttaho .2 = {Hz rei Hr r ° 5 H - rx , H7 S6l H~ S65 H^ sx , H~ tei H~ te5 Hi tx ) 

3. For each i £ {1, . . . , n}, generate a Waters signature (oyi , 0 ^ 2 ) on the 

word nii £ {0, 1} L by computing ( 07 1 , 0 ^ 2 ) = ( h x ■ H G (m,i) Xi , g Xi ) for 
a randomly chosen \i Then, generate a Groth-Sahai commitment 

4 U = ie(<R,i) • fi* 1 ■ / 2 s *’ 1 • f 3 % ' X , with ■£- Z p , and a NIWI 

proof 7 Tw,i that {X, < 7 ^ 1 ) satisfy 

e{H G (mi),a i>2 ) = e{X,h)~ 1 ■ e(a iA ,g). (4) 

This proof is obtained as nw,i = ( h rx ■ g~ riA , h sx ■ g h tx ■ g ~ tiA ). 

4. Return the signature 

V = (pX, {Go-*, 2 , 'AV.ilLi)- (5) 
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Note that proofs 7 ?aho,1) 7Taho, 2 and {nw,i}i only depend on the randomness 
used in commitments and not on the committed values or on the left-hand-side 
members of pairing-product equations Q) and ©■ 

SignDerive(pk, Msg, Msg', a): given pk, Msg = and Msg' = 

return _L if there exists i G {1 , n'} such that m' ^ {mj}" =1 . Otherwise, 
parse a as in For each i G {1, . . . , n'}, let p(i) G {1, . . . , n} be the index 
such that m' = rn p ^ . Then, for each i £ {1 , n'}, do the following. 

1. Re-randomize the commitment C x and the proofs 7 Taho,1) ttaho, 2 , {^w,i}i 
accordingly. Let G' x , 7?aho i> ^aho 2 > an< l {^wi\i be the randomized com- 
mitment and proofs. Note that, in all of these commitments and proofs 
( rx,sx,tx ) have been updated consistently. 

2. Re-randomize {^ }je{ 2 , 5 } and {Oj}je{ 3 , 4 , 6, 7 } by choosing Qh,p, v 
and computing 

C’e 2 = Cg 2 ■ iG (0 4 ) S2 e’ 3 = ( e 3 ■ G-^) 1/fl e' 4 = 

C'e 5 = Ce 5 ■ t G (0 7 ) S5 0' 6 = (0 6 • H~ e *) X ' v 6' 7 = 0 V 7 . 

We note that, although the committed values inside Cg 2 , C k have 
changed. The proofs tt^ h0 x , tt^ h0 2 are still valid for the new commit- 
ted values. Then, compute {(? 0 .}je{i, 2 , 5 } by re-randomizing the com- 
mitments C$ 1 , {C'g . }j e { 2 . 5 } and re-randomize the proofs tt^ ho ; , 7r^ H o 2 
again. Let 7 Taho d^aho 2 be the re-randomized proofs. 

3. For each i e {1, . . . , n'}, choose y- Z p and compute 

1 ' (•%("»**) )**}, 4(0,2 = 2 • 9 ■ 

Even though the committed value inside 1 has changed, ty’ w 
remains a valid proof that the updated committed value ed ^ 1 satisfies 
e(X,h) ■ a J p ^ 2 ) = e £4(0,i’^)‘ The commitment a is 

then re-randomized and the proof tt' W p ^ is re-randomized accordingly. 
Let G" i and 7 ?^, denote the new commitment and proof. 

4. Return the signature 

a ' = (pXt {Cg.}je{l,2,5}, {^}je{3,4,6,7} 5 

^AHO,l> ^AHO,2> {^V„ (i)>1 > a 'p(i),2’™W,p(i)}i= l) • (®) 

Verify(pk, Msg, a): given pk, a and Msg = parse a as per Q. 

1. Return 0 if 7 taho,i = {^ 1 ,^ 2 , ^ 3 ) and 7 ?aho,2 = (tt 4 , tts, 776) do not satisfy. 

l Gt (A) ■ E(e 3 ,iG(e4))~ 1 =E(G z ,Ce 1 ) ■ E{G r ,Ce 2 ) ■ E{Gi,<3 x ) '!]%,/)) 
l Gt (B) ■ E(9 6 , tG(e 7 )) _1 = E(H Z , C Bl ) • E(H r , Ce 5 ) • E(H U C x ) fj E(ir j+S , fy). 
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2. Return 1 if and only if, for each i, jfw,i = (7Tw,i,i> n w,i l 3 ) satisfies 

E(h,C x ) ■ = E(g,C aiA ) ■ ]jE(n w ,y. y fj). 

j=i 

In the full version of the paper, we prove that the scheme is unforgeable under 
the DLIN and g-SFP assumptions and completely context hiding. 
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Abstract. Inspired by cold boot attacks, Heninger and Shacham (Crypto 
2009) initiated the study of the problem of how to recover an RSA pri- 
vate key from a noisy version of that key. They gave an algorithm for 
the case where some bits of the private key are known with certainty. 
Their ideas were extended by Henecka, May and Meurer (Crypto 2010) 
to produce an algorithm that works when all the key bits are subject to 
error. In this paper, we bring a coding-theoretic viewpoint to bear on the 
problem of noisy RSA key recovery. This viewpoint allows us to cast the 
previous work as part of a more general framework. In turn, this enables 
us to explain why the previous algorithms do not solve the motivating 
cold boot problem, and to design a new algorithm that does (and more). 
In addition, we are able to use concepts and tools from coding theory 
- channel capacity, list decoding algorithms, and random coding tech- 
niques - to derive bounds on the performance of the previous and our 
new algorithm. 


1 Introduction 

Cold boot attacks PH are a class of attacks wherein memory remanence effects 
are exploited to extract data from a computer’s memory. The idea is that modern 
computer memories retain data for periods of time after power is removed, so an 
attacker with physical access to a machine may be able to recover, for example, 
cryptographic key information. The time during which data is retained can be 
increased by cooling the memory chips. However, because the memory gradually 
degrades over time once power is removed, only a noisy version of the data may 
be recoverable. The question then naturally arises: given a noisy version of a 
cryptographic key, is it possible to reconstruct the original key? 

This question was addressed for broad classes of cryptosystems, both symmet- 
ric and asymmetric, by Halderman et al. in m and specifically for RSA private 
keys in PEj. Similar problems arise in the context of side-channel analysis of 
cryptographic implementations, where noisy key information may leak through 
power consumption m or timing [2j . The question is also linked to the classical 
cryptanalysis problem of recovering an RSA private key when some bits of the 
key are known, for example the most or least significant bits, or contiguous bits 
spread over a number of blocks (see, for example, the surveys in DH2I and HD!). 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 386-gU3J 2012. 
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Heninger and Shacham (HS) |0| considered the setting where a random frac- 
tion of the RSA private key bits is known with certainty. Their approach ex- 
ploits the fact that the individual bits of an RSA private key of the form 
sk = (p,q,d,d p ,d q ) must satisfy certain algebraic relations. This enables the 
recovery of the private key in a bit-by-bit fashion, starting with the least signifi- 
cant bits, by growing a search tree. It is easy to prune the search tree to remove 
partial solutions which do not match with the known key bits. The resulting algo- 
rithm will always succeed in recovering the private key, since the pruning process 
will never remove a partial version of the correct solution. On the other hand, 
when only few bits are known, the search tree may grow very large, and the HS 
algorithm will blow up. It was proved in j2| that, under reasonable assumptions 
concerning randomness of incorrect solutions, the HS algorithm will efficiently 
recover an n-bit RSA private key in time 0(n 2 ) with probability 1 — 1/n 2 when a 
random fraction of at least 0.27 of the private key bits are known with certainty. 
These theoretical results are well-matched by experiments reported in jSj . These 
experiments also confirm that the HS algorithm has good performance when the 
known fraction is as small as 0.24, and the analysis of |0j extends to cases where 
the RSA private key sk is of the form (p. q, d) or (p, q). 

Henecka, May and Meurer (HMM) j^j took the ideas of |2j and developed them 
further to address the situation where no key bits are known with certainty. They 
consider the symmetric case where the two possible bit flips 0 — > 1, 1 — > 0 have 
equal probability 6. Their main idea was to consider t bit-slices at a time of 
possible solutions to the equations relating the bits of sk, instead of single bits 
at a time as in the HS algorithm. In the formulation where sk = ( p , q. d. d p , d q ) , 
this yields 2 t candidate solutions on 5 1 new private key bits for each starting 
candidate at each stage of the algorithm. The HMM algorithm then computes the 
Hamming distance between the candidate solutions and the noisy key, keeping 
all candidates for which this metric is less than some carefully chosen threshold 
C. This replaces the procedure of looking for exact matches used in the HS 
algorithm. Of course, now the correct solution may fail this statistical test and 
be rejected; moreover the number of candidate solutions retained may explode if 
C is set too loosely. Nevertheless, it was shown in 0 that the HMM algorithm is 
efficient and has reasonable success in outputting the correct solution provided 
that S < 0.237. Again, the analysis depends on assumptions concerning the 
random behaviour of wrong solutions. To support the analysis, [Bj reports the 
results of experiments for different noise levels and algorithmic parameters. For 
example, the algorithm can cope with S = 0.20. 

In recent work independent of ours, Sarkar and Maitra m revisited the work 
of (HI, applying the HMM algorithm to break Chinese Remainder implementa- 
tions of RSA with low weight decryption exponents and giving ad hoc heuristics 
to improve the algorithm. 

Limitations of Previous Work and Open Questions: Although inspired 
by cold boot attacks, it transpires that neither the HS algorithm nor the HMM 
algorithm actually solve the motivating cold boot problem. Let us see why. 
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One observation made in pm is that for a given region of memory, the decay 
of memory bits is overwhelmingly either 0 A 1 or 1 4 0, while the decay 
direction in a given region can be inferred by comparing the number of Os and 
Is (since for an uncorrupted private key, we expect these to be roughly equal). 
Thus, in a 1 -A 0 region, a 1 bit in the noisy version of the key is known (with 
high probability) to correspond to a 1 bit in the original key. 

In the case of P , the assumption is made that a certain fraction of the RSA 
private key bits — both Os and Is - is known with certainty. But, in the cold boot 
scenario, only 1 (or 0) bits are known, and not a mixture of both. Fortunately, 
the authors of P have informed us that their algorithm does still work when 
only 0 or only 1 bits are known, but this is not the case it was designed for, 
and, formally, the performance guarantees obtained in P do not apply in this 
case. Furthermore, in a real cold boot attack, bits are never known with absolute 
certainty, because even in a 1 A 0 region, say, bit flips in the reverse direction 
can occur. Halderman et al. report rates of 0.05% to 0.1% for this event. Such 
an event will completely derail the HS algorithm, as it will result in the correct 
solution being eliminated from the search tree. Based on an occurrence rate of 
0.1%, this kind of fatal event can be expected to arise around 2.5 to 5 times in 
a real key recovery attack for 1024-bit RSA moduli with sk = (p,q,d,d p ,d q ). 
Thus, the HS algorithm really only applies to an “idealised” cold boot setting, 
where some bits are known for sure. 

The HMM algorithm is designed to work for the symmetric case where the 
two possible bit flips have equal probability 5. Yet, in a cold boot attack, in a 
1—^0 region say, a := Pr(0 -A 1) will be very small (though non-zero), while 
/3 := Pr(l -a 0) may be relatively large, and perhaps even greater than 0.5 in 
a very degraded case. The use of Hamming distance as a metric for comparison 
and the setting of the threshold C are closely tied to the symmetric case, and it 
is not immediately clear how one can generalise the HMM approach to handle 
the type of errors occurring in real cold boot attacks. So it does not solve the 
cold boot problem for RSA keys. 

Intriguing features of the work in PP are the constants 0.27 and 0.237, which 
bound the fraction of known bits/noise rate the HS and HMM algorithms can 
handle. One can trace through the relevant analysis to see how these numbers 
emerge, but it would be more satisfying to have a deeper, unifying explanation. 
One might also wonder if these bounds are best possible or whether significant 
improvements might yet be forthcoming. Is there any ultimate limit to the noise 
level that these kinds of algorithms can deal with? And can we design an algo- 
rithm that works in the true cold boot setting, or for fully general noise models 
that might be expected to occur in other types of side channel attack? 

Our contributions: We show how to recast the problem of noisy RSA key recov- 
ery as a problem in coding theory. That such a connection exists should be no 
surprise: after all, we are in a situation where bits are only known with certain 
probabilities and we wish to recover the true bits. However, this connection opens 
up the opportunity to apply to our problem the full gamut of sophisticated tools 
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that have been developed by coding theorists over the last 60 years. We sketch 
this connection and its main consequences next. 

Recall that in the HMM algorithm, we generate from each solution so far a set 
of 2* candidate solutions on 5 1 new bits. We now view the set of 2* candidates 
as being a code, with one codeword s (representing bits of the true private key) 
being selected and transmitted over a noisy channel, resulting in a received word 
r (representing 5 1 bits of the noisy version of the key). In the HMM case, the 
noise is realised via bit flipping with probability 6. The HS algorithm can be 
seen as arising from the special case t = 1, where the noise now corresponds 
to erasing a fraction of key bits instead of flipping them. Alternatively, we can 
consider a generalisation of the HS algorithm which considers 5 1 bits at a time, 
generated just as in the HMM algorithm, and which then filters the resulting 2 t 
candidates based on matching with known key bits. Because filtering is based on 
exact matching, this algorithm has the same output as the original HS algorithm. 
This brings the two algorithms under a single umbrella. 

In general, in coding theory, the way in which s is transformed into r depends 
on the channel model, which in its full generality defines the probabilities Pr(r|s) 
over all possible pairs (s, r). In the case of 0, the assumption is that particular 
bits are known with certainty and others are not known at all, with the bits all 
being treated independently. The appropriate channel model is then an erasure 
channel, meaning that bits are independently either erased or transmitted cor- 
rectly over the channel, with the receiver knowing the positions of the erasures. 
In the case of |Hj , the appropriate channel model is the binary symmetric channel 
with cross-over probability 6. It also emerges that the appropriate channel model 
for the true cold boot setting is a binary non-symmetric channel with cross-over 
probabilities In general, the problem we are faced with is to decode r, 

with the aim being to reproduce s with high probability. 

When couched in this language, it becomes obvious that the HS and HMM 
algorithms do not solve the original cold boot problem - simply put these al- 
gorithms use inappropriate channel models for that specific problem. We can 
also use this viewpoint to derive limits on the performance of any procedure for 
selecting which candidate solutions to keep in an HMM-style algorithm. To see 
why, we recall that the converse to Shannon’s noisy-channel coding theorem H3 
states that no combination of code and decoding procedure can jointly achieve 
arbitrarily reliable decoding when the code rate exceeds the (Shannon) capac- 
ity of the channel. Moreover, there are analogues of the converse of Shannon’s 
theorem for so-called list decoding that essentially show that channel capacity is 
also the barrier to any efficient algorithm outputting fists of candidates, as the 
HS and HMM algorithms do. 

When sk is of the form ( p , q, d, d p , d q ), for example, the code rate is fixed at 1/5 
(we have 2* codewords and length 5 1). The channel capacity can be calculated 
as a function of the channel model and its parameters. For example, for the 
erasure channel with erasure probability p (meaning that a fraction 1 — p of the 
bits are known with certainty), the capacity is simply 1 — p. Then we see that 
the limiting value is p = 0.8, meaning that the fraction of known bits must be 
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at least 0.2 to achieve arbitrarily reliable, efficient decoding. The analysis in j2j 
needs that fraction to be at least 0.27, though a fraction as low as 0.24 could be 
handled in practice. Thus a capacity analysis suggests that there should be room 
to improve the HS algorithm further, but capacity shows that it is impossible 
go below a fraction 0.2 of known bits with an efficient algorithm. See Section 0 
for further details on list decoding and its application to the analysis of the HS 
and HMM algorithms. 

Informed by our coding-theoretic viewpoint, we derive a new key recovery 
algorithm that works for any (memoryless) binary channel and therefore is ap- 
plicable to the cold boot setting (and more). In essence, we modify the HMM 
algorithm to use a likelihood statistic in place of the Hamming metric when 
selecting from the candidate codewords. We keep the L codewords having the 
highest values of this likelihood statistic and reject the others. An important 
consequence of this algorithmic choice is that our algorithm has deterministic 
running time 0(L2 t n/t ) and, when implemented using a stack, deterministic 
memory consumption 0(L + t). This stands in contrast to the running time and 
memory usage of the HS and HMM algorithms, which may blow up when the 
erasure/error rates are high. We note that private RSA keys are big enough that 
they may cross regions when stored in memory. We can handle this by chang- 
ing the likelihood statistic used in our algorithm at the appropriate transition 
points, requiring only a simple modification to our approach. In the full version, 
we give an analysis of the success probability of our new algorithm, under dif- 
ferent randomness hypotheses, using coding-theoretic tools. Essentially, we are 
able to show that, as t — >• oo, its success probability tends to 1 provided the 
code rate (1/5 when sk = (p,q,d,d p ,d q )) remains below the channel capacity. 
Moreover, from the converse to Shannon’s theorem, we are unlikely to be able 
to improve this result if reliable key recovery is required. 

We include the results of extensive experiments using our new algorithm. 
These demonstrate that our approach matches or outperforms the HS and HMM 
algorithms in the cases they are designed for, and achieves results close to the 
limits imposed by our capacity analysis more generally. For example, in the 
symmetric case with <5 = 0.20, we can achieve a 20% success rate in recovering 
keys for t = 18 and L = 32. This is comparable to the results of jHj- Furthermore, 
for the same t and L we achieve a 4% success rate for 6 = 0.22, whilst jSj does 
not report any experiments for an error rate this high. As another example, our 
algorithm can handle the idealised cold boot scenario by setting a = 0 (in which 
case all the 1 bits in r are known with certainty, i.e. we are in a 1 — > 0 region). 
Here, our capacity analysis puts a bound of 0.666 on /3 for reliable key recovery. 
Using our algorithm, we can recover keys for /3 = 0.6 with a 13% success rate 
using t = 18 and L = 32, whereas the HS algorithm can only reach jS = 0.52 
(and this under the assumption that the experimental results reported in [Oj for 
a mixture of known 0 and 1 bits do translate to the same performance for the 
case where only 1 bits are known). In the same setting, we can even recover keys 
up to /3 = 0.63 with a non-zero success rate. We also have similar experimental 
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results for the ‘true’ cold boot setting where both a and 0 are non-zero, and for 
the situation where sk is of the form ( p , q, d) or (p, q). 

Paper Organisation: The remainder of this paper is organised as follows. In the 
next section, we give further background on the algorithms of jB10. In Sectional 
we develop the connection with coding theory and explain how to use it to derive 
limits on the performance of noisy RSA key recovery algorithms. Section 0| de- 
scribes our new maximum likelihood list decoding algorithm. Section 0 presents 
our experimental results. Finally, Section E| contains some closing remarks and 
open problems. 

2 The HS and HMM Algorithms 

Let ( N , e) be the RSA public key, where N = pq is an n-bit RSA modulus, 
and p, q are balanced primes. As with (BIEI, we assume throughout that e is 
small, say e = 3 or e = 2 16 + 1; for empirical justification of this assumption, 
see |EJ- We start by assuming that private keys sk follow the PKCS#1 standard 
and so are of the form (N,p, q, e, d, d p , d q , g" 1 ), where d is the decryption key, 
d p = d mod p — 1, d q = d mod q — 1 and q p = q^ 1 mod p. However, neither 
the algorithms of |S1I0| nor ours make use of q~ l , so we henceforth omit this 
information. Furthermore, we assume N and e are publicly known, so we work 
only with the tuple sk = (p, q , d. d p , d q ). We will also consider attacks where the 
private key contains less information - either sk = (p, q, d) or sk = (p. q). 

Now assume we are given a degraded version of the key sk = (p, q. d. d p , d q ). 


We start with the four RSA equations: 

N = pq (1) 

ed=k{N -p-q+l) + l (2) 

ed p = k p (p — 1) + 1 (3) 

ed q = k g (q - 1) + 1. (4) 


where k, k p and k q are integers to be determined. A method for doing so is given 
in jOJ : first it is shown that 0 < k < e; then, since e is small, we may enumerate 

k'(N+ 1) + 1 
e 

for all 0 < k' < e. We then find the k! such that d(k') is “closest” to d in the most 
significant half of the bits. Simple procedures for doing this are given in |BII2S. 
In the more general setting where bit flips can occur in both directions and 
with different probabilities, we proceed as follows. First, we estimate parameters 
a = Pr(0 — > 1) and 0 = Pr(l — »• 0) from known bits, e.g. from a noisy version of 
N that is adjacent in memory to the private key. Second, we compute for each 
k! an approximate log-likelihood using the expression 

n 0 i log a + n 0 o log(l - a) + n w log 0 + nn log(l - /3) 
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where noi is the number of positions in the most significant half where a 0 
appears in d(k') and a 1 appears in d. etc. Finally, we select the k' that provides 
the highest log-likelihood. 

At the end of this procedure, with high probability we will have k' = k and 
we will have recovered the most significant half of the bits of d. Now we wish to 
find k p and k q . By manipulating the above equations we see that 

kp — ( k(N — 1) + 1 )k p — k = 0 mod e 

If e is prime (as in the most common case e = 2 16 + 1) there will only be two 
solutions to this equation. One will be k p and the other k q . If e is not prime we 
will have to try all possible pairs of solutions in the remainder of the algorithm. 

Now, for integers x, we define t(x) := max{i 6 N : 2' 1 | x}. Then it is easy to 
see that 2 T ^ +1 divides k p (p — 1), 2 T ^ +1 divides k q {q — 1) and 2 T ^ +2 divides 
k(f>{N). These facts, along with relations (0) - 0, allow us to see that 

d p = e -1 mod 2 r(fc ^ +1 
d q = e -1 mod 2 T ^ +1 
d = e -1 mod2^ fc )+ 2 . 

This allows us to correct the least significant bits of d, d p and d q . Furthermore 
we can calculate slice(O), where we define 

slice(i) := (p[i],q[i],d[i + r(k)\,d p [i + T(k p )],d q [i + r(k q )]). 

with x[i] denoting the i-th bit of the string x. 

Now we are ready to explain the main idea behind the algorithm of 0 • Sup- 
pose we have a solution (p’ , q’ , d' , d p , d’ q ) from slice(O) to slice(* — 1). Then 0 
uses a multivariate version of Hensel’s Lemma to show that the bits involved in 
slice(i) must satisfy the following congruences: 

p[i] + q[i ] = (N — p'q')[i] mod 2 

d[i + r(fc)] +p[i\ + q[i\ = ( k(N + 1 ) + 1 - k(p' + q') - ed')[i + r(fc)] mod 2 

d p [i + r(k p )] +p[i ] = ( k p (p ' - 1) + 1 - ed p )[i + r(/c p )] mod 2 

d q [i + r(k q )\ + q[i\ = ( k q (q ' - 1 ) + 1 - ed' q )[i + T(k q )\ mod 2. 

Because we have 4 constraints on 5 unknowns, there are exactly 2 possible solu- 

tions for slice(i), rather than 32. This is then used in 0 as the basis of building 
a search tree for the unknown private key bits. At each node in the tree, rep- 
resenting a partial solution up to slice(* — 1), at most two successor nodes are 
added by the above procedure. Moreover, since a random fraction of the bits 
is assumed to be known with certainty, the tree can be pruned of any partial 
solutions that are not consistent with these known bits. Clearly, if the fraction of 
known bits is large enough, then the tree will be highly pruned and the number 
of nodes in the tree will be small. The analysis of 0 shows that if the fraction of 
known bits is at least 0.27, then the tree’s size remains close to linear in n, the 
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size of the RSA modulus, meaning that an efficient algorithm results. A similar 
algorithm and analysis can be given for the case where sk is of the form ( p , q, d) 
or (p,q)', in each case, there are exactly 2 possible solutions for each slice(i). 

Instead of doing Hensel lifting bit-by-bit and pruning on each bit, the HMM 
algorithm performs t Hensel lifts for some parameter t, yielding, for each surviv- 
ing candidate solution on slice(O) to slice(i — 1), a tree of depth t whose 2* leaf 
nodes represent candidate solutions on slices slice(O) to slice(i + 1 — 1), involving 
5 1 new bits (in slice(z) to slice(i + t — 1)). A solution is kept for the next iteration 
if the Hamming distance between the 5 1 new bits and the corresponding vector 
of noisy bits is less than some threshold C. Clearly the HS algorithm could also 
be modified in this way, lifting t times and then doing pruning based on match- 
ing known key bits. Alternatively, one can view the HS algorithm as being the 
special case t = 1 of the HMM algorithm (with a different pruning procedure). 
The HMM algorithm can also be adapted to work with sk of the form ( p , q, d) 
or ( p , q). Henecka et al. |Hj showed how to select C and t so as to guarantee that 
their algorithm is efficient and produces the correct solution with a reasonable 
success rate. In particular, they were able to show that this is the case provided 
the probability of a bit flip 8 is at most 0.237. 

At each stage in the HMM algorithm, candidate solutions on t new slices 
are constructed. Then roughly n/2t iterations or stages of the algorithm are 
needed, since all the quantities being recovered contain at most n/2 bits. As 
pointed out in jHj , only half this number of stages is required since once we have 
the least significant half of the bits of the private key, the entire private key 
can be recovered using a result of Coppersmith 0 ■ At their conclusion, the HS 
and HMM algorithms outputs lists of candidate solutions rather than a single 
solution. But it is easy to verify the correctness of each candidate by using a 
trial encryption and decryption, say. Thus the success rate of the algorithms is 
defined to be the probability that the correct solution is on the output list. We 
adopt the same measure of success in the remainder of the paper. 

3 The Coding-Theoretic Viewpoint 

In this section, we develop our coding-theoretic viewpoint on the HS and HMM 
algorithms, using it to derive limits on the performance of these and similar 
algorithms. In particular, we will explain how channel capacity plays a crucial 
role in setting these limits. 

We begin by defining the parameter m. We set m = 5 when sk = ( p , q. d, d p . d q ), 
m = 3 when sk = (p. q, d), and m = 2 when sk = (p, q). Consider a stage of the 
HMM algorithm, commencing with M partial solutions that have survived the 
previous stage’s priming step. The HMM algorithm produces a total of M2* 
candidate solutions on mt bits, prior to pruning. We label these si , . . . , sm 2 * • let 
C denote the set of all M2* candidates, and use r to denote the corresponding 
vector of mt noisy bits in sk. 

Now we think of C as being a code. This code has rate R > 1/m, but its 
other standard parameters such as its minimum distance are unknown (and 
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immaterial to our analysis). The problem of recovering the correct candidate 
s j given r is clearly just the problem of decoding this code. Now both the HS 
and HMM algorithms have pruning steps that output lists of candidates for the 
correct solution, with the list size being dynamic in both cases and depending on 
the number of candidates surviving the relevant filtering process (based either 
on exact matches for the HS algorithm or on Hamming distance for the HMM 
algorithm). In this sense, the HS and HMM algorithms are performing types 
of list decoding, an alternative to the usual unique decoding of codes that was 
originally proposed by Elias j2j . 

To complete the picture, we need to discuss what error and channel models are 
used in jHUHj , and what models are appropriate to the cold boot setting. As noted 
in the introduction, H assumes that some bits of r are known exactly, while no 
information at all is known about the other bits. This corresponds to an erasure 
model for errors, and an erasure channel. Usually, this is defined in terms of a 
parameter p representing the fraction of erasures. So 1 — p represents the fraction 
of known bits, a parameter denoted <5 in j^j. On the other hand, jH| assumes that 
all bits of r are obtained from the correct s j by independent bit flipping with 
probability 5. In standard coding terminology, we have a (memoryless) binary 
symmetric channel with crossover probability 8. From the experimental data 
reported in |01EJ) an appropriate model for the cold boot setting would be a 
binary non-symmetric channel with crossover probabilities (a, 0), with a being 
small and /3 being significantly larger in a 1 -> 0 region (and vice-versa in a 
0 — > 1 region). In an idealised cold boot case, we could assume a = 0, meaning 
that a 0 — >• 1 bit flip can never occur, so that all 1 bits in r are known with 
certainty. This is better known as a Z-channel in the coding-theoretic literature. 

This viewpoint highlights the exact differences between the settings considered 
in HE! and the cold boot setting. It also reveals that, while the HS algorithm 
can be applied for the Z-channel seen in the idealised cold boot setting, there is 
no guarantee that the performance proven for it in for the erasure channel 
will transfer to the Z-channel. Moreover, one might hope for substantial improve- 
ments to the HS algorithm if one could somehow take into account the (partial) 
information known about 0 bits as well as the exact information known about 1 
bits. 

3.1 The Link to Channel Capacity 

We can use this coding viewpoint to derive limits on the performance of any 
procedure for selecting which candidate solutions to keep in the HS and HMM 
algorithms. To see why, we recall that the converse to Shannon’s noisy-channel 
coding theorem [13] states that no combination of code and decoding procedure 
can jointly achieve arbitrarily reliable decoding when the code rate exceeds the 
capacity of the channel. Our code rate is at least 1/m where m = 2, 3 or 5 and 
the channel capacity can be calculated as a function of the channel model and 
its parameters. 

Two caveats must be made here. Firstly, capacity only puts limits on reli- 
able decoding, and even decoding with low success probability is of interest in 
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cryptanalysis. Secondly, Shannon’s result applies only to decoding algorithms 
that output a single codeword s, while both the HS and HMM algorithms are per- 
mitted to output many candidates at each stage, with the final output list only 
being required to contain the correct private key. Perhaps such list-outputting 
algorithms can surpass the bounds imposed by Shannon’s theorem? Indeed, the 
HS algorithm is guaranteed to output the correct key provided the algorithm 
terminates. Similarly, the threshold C in the HMM algorithm can always be set 
to a value that ensures that every candidate passes the test and is kept for the 
next stage, thus guaranteeing that the algorithm is always successful. However, 
neither of these variants would be efficient and in fact there are analogues of the 
converse of Shannon’s noisy-channel coding theorem that essentially show that 
capacity is the barrier for efficient list decoding too. 

For the binary symmetric channel, it is shown in [3 Theorem 3.4] that if C is 
any code of length n and rate 1 — H 2 (S) + e for some e > 0, then some word r 
is such that the Hamming sphere of radius Sn around r contains at least 2 e "/ 2 
codewords. Here ITg(-) is the binary entropy function: 

H 2 (x) = —xlog 2 (x) - (1 - x) log 2 (l - x) 

and 1 — H 2 (S ) is just the capacity of the channel. The proof also shows that, over 
a random choice of r, the average number of codewords in a sphere of radius Sn 
around r is 2 en / 2 . Since the expected number of errors in r is Sn. we expect the 
correct codeword to be in this sphere, along with 2 en / 2 other codewords. This 
implies that, if the rate of the code exceeds the channel capacity 1 — H 2 (S) by a 
constant amount e, then C cannot be list decoded using a polynomial-sized list, 
either in the worst case or on average, as n —> oo. 

An analogous result can be proved for the erasure channel, based on a similarly 
simple counting argument as was used in the proof of 0 Theorem 3.4]: if p is 
the erasure probability and C is any code of rate 1 — p + e (i.e. e above the 
erasure channel’s capacity), then it can be shown that on average there will be 
2 e " codewords that differ from r in its erasure positions, assuming r contains pn 
erasure symbols. Hence reliable list decoding for C cannot be achieved using a 
polynomial-sized list. 

In the next sub-section, we will examine in more detail the implications of 
these results on list decoding for the HS and HMM algorithms. 

3.2 Implications of the Capacity Analysis 

The Binary Symmetric Channel and the HMM Algorithm. If the HMM 

algorithm is to have reasonable success probability in recovering the key, then 
at each stage, it must set the threshold C in such a way that all words s* G C 
with r) as Smt are accepted by the algorithm. This is because Smt is the 

expected number of errors occurring in r, and if the threshold is set below this 
value, then the correct codeword is highly likely to be rejected by the algorithm. 
(In fact, the HMM algorithm sets C to be slightly higher than this, which makes 
good sense given that there is an even chance of there being more than Smt 
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errors.) Recall that we have rate R > 1/m. Now suppose 8 is such that R = 
1 — H 2 { 8 ) + e for some e > 0, i.e. 8 is chosen so that that the code rate is just 
above capacity. Then the argument above shows that there will be on average 
at least 2 emt / 2 codewords on the output list at each stage. Thus, as soon as 8 
is such that R exceeds capacity by a constant amount e, then there must be a 
blow-up in the algorithm’s output size at each stage, and the algorithm will be 
inefficient asymptotically. 

We write C'bSC'(^) = 1 — # 2(6) for the capacity of the binary symmetric 
channel. Table d shows that CjjscW = ^-2 when 8 = 0.243. Thus what our 
capacity analysis shows is that the best error rate one could hope to deal with 
in the HMM algorithm when m = 5 is 8 = 0.243. Notice that this value is rather 
close to, but slightly higher than, the corresponding value of 0.237 arising from 
the analysis in jH| . The same is true for the other entries in this table. This means 
that significantly improving the theoretical performance of the HMM algorithm 
(or indeed any HMM-style algorithm) whilst keeping the algorithm efficient will 
not be possible. The experimental work in jHj gives results up to a maximum 8 
of 0.20; compared to the capacity bound of 0.243, it appears that there is some 
room for practical improvement in the symmetric case. 

The Erasure Channel and the HS Algorithm. As noted above, for the 
erasure channel, the capacity is 1 — p, where p is the fraction of bits erased by 
the channel. Note that the list output by the HS algorithm is independent of 
whether pruning is done after each lift or in one pass at the end (but obviously 
doing so on a lift-by-lift basis is more efficient in terms of the total number of 
candidates examined). Then considering the HS algorithm in its entirety (i.e. 
over n/ 2 Hensel lifts), we see that it acts as nothing more than a list decoder 
for the erasure channel, with the code C being the set of all 2"/ 2 words on mn/2 
bits generated by doing n/ 2 Hensel lifts without any pruning, and the received 
word r being the noisy version of the entire private key sk. 

Then our analysis above applies to show that the HS algorithm will produce 
an exponentially large output list, and will therefore be inefficient, when the 
rate (which in this case is exactly 1/m) exceeds the capacity 1 — p. For m = 5, 
we have rate 0.2 and so our analysis shows that the HS algorithm will produce 
an exponentially large output list whenever p exceeds 0.8. Now [Hj reports good 
results (in the sense of having a reasonable running time) for p as high as 0.76 
(corresponding to Heninger and Shacham’s parameter 8 being equal to 0.24), 


Table 1. Private key- type, equivalent 
rate R, and maximum crossover prob- 
ability 8 allowing reliable key recovery, 
symmetric channel case 


Table 2. Private key-type, equivalent 
rate R, and maximum error probabil- 
ity p allowing reliable key recovery, Z- 
channel case 


sk 

R 

8 

(p,q,d,d p ,d q ) 

1/5 

0.243 

(p, q, d) 

1/3 

0.174 

(p, q) 

1/2 

0.110 


sk 

R 

8 

(p,q,d,d p ,d q ) 

1/5 

0.666 

(p, q , d) 

1/3 

0.486 

(p, q) 

1/2 

0.304 
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leaving a gap between the experimental performance and the theoretical bound. 
Similar remarks apply for the cases m = 2, 3: for m = 2, the HS algorithm should 
be successful for p = 0.43 (5 = 0.57), while the bound from capacity is 0.50; for 
m = 3, we have p = 0.58 ( 8 = 0.42) and the capacity bound is 0.67. Hence, 
further improvements for m = 2, 3 are not ruled out by the capacity analysis. 


The Z-channel. We may also apply the above capacity analysis to the idealised 
cold boot setting, where the crossover probabilities are of the form (0, 0). Here 
we have a Z-channel, whose capacity can be written as: 

C Z (P) = log 2 (l + (l-jS)^). 

Solving the equation C^{0) = R for R = 1/5, 1/3, 1/2 gives us the entries in 
Table d We point out the large gap between these figures and what we would 
expect to obtain both theoretically and experimentally if we were to directly 
apply the HS algorithm to the idealised cold boot setting. For example, when 
m = 5, the analysis of |0| suggests that key recovery should be successful provided 
that 0 does not exceed 0.46 (the value of <5 = 0.27 translates into a 0 value of 
0.46 using the formula S = (1— /3)/2 given in jHj), whereas the capacity analysis 
suggests a maximum 0 value of 0.666. This illustrates that the HS algorithm 
is not well-matched to the Z-channel. Our new algorithm will close this gap 
substantially. 


The True Cold Boot Setting. For the true cold boot setting, we must con- 
sider the general case of a memory less, binary channel with crossover probabili- 
ties (a, /3). We can calculate the capacity C(a, 0) of this channel and obtain the 
regions for which C(a,0) > R for R= 1/5, 1/3, 1/2. The results are shown in 
Figured Notice that these plots include as special cases the data from Tables d 
and d If we set a = 0.001, say, we see that the maximum achievable 0 is quite 
close to that in the idealised cold boot setting. Note also that the plots are sym- 
metric about the lines y = x and y = 1 — x, reflecting the fact that capacity is 
preserved under the transformations (a, 0) -A (0, a) and (a, 0) — > (1 — a, 1 — 0). 

However, we must caution that capacity-based bounds for list decoding for the 
general binary non-symmetric channel (including the Z-channel) are not known 
in the coding-theoretic literature. Strictly speaking, then, our capacity analysis 
for this case does not bound the performance of key recovery algorithms that are 
allowed to output many key candidates, but only the limited class of algorithms 
that output a single key candidate. This said, our capacity analysis sets a target 
for our new algorithm, which follows. 

4 The New Algorithm and Its Analysis 

In this section, we give our new algorithm for noisy RSA key recovery that 
works for any memoryless, binary channel, as characterised by the cross-over 
probabilities (a,0). Our algorithm has the same basic structure as the HMM 
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Fig. 1. Plots showing achievable (a, ff) pairs for private keys containing 5, 3 and 2 
components, respectively. The vertical axis is <3, the horizontal axis is a. The shaded 
area in each case represents the unachievable region. 


algorithm but uses a different procedure to decide which candidate solutions to 
retain and which to reject. Specifically, we use a likelihood measure in place of 
Hamming distance. 

Recall that we label the M2* candidate solutions on mt bits arising at some 
stage in the HMM algorithm si , . . . , sm 2 * and let us name the corresponding 
vector of mt noisy bits in the RSA private key r. Then the Maximum Likelihood 
(ML) estimate for the correct candidate solution is simply: 


Pr(si|r) 


that is, the choice of i that maximises the conditional probability Pr(s, |r). Using 
Bayes’ theorem, this can be rewritten as: 


arg max 
l<i<M2* 


Pr(r|si) Pr(si) 
Pr(r) 


Here, Pr(r) is a constant for a given set of bits r. Let us make the further 
mild assumption that Pr(sj) is also a constant, independent of i. Then the ML 
estimate is obtained from 


arg max (Pr(rlsj)) = arg max ( (1 — a)" 00 a" 01 (1 — d)" 11 /?” 10 


where a = Pr(0 -A 1) and 6 = Pr(l — > 0) are the crossover probabilities, rig 0 
denotes the number of positions where s,; and r both have 0 bits, n l m denotes the 
number of positions where Sj has a 0 and r has a 1, and so on. 

Equivalently, we may maximise the log of these probabilities, and so we seek: 

arg max (logPr(rlsj)) 

= arg (rigg log(l - a) + n l ol log a + n l n log(l - /3) + rP 10 log /3) 


which provides us with a simpler form for computational purposes. 
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Algorithm 1 . Pseudo-code for the maximum likelihood list decoding al- 
gorithm for reconstructing RSA private keys, 
list <- slice(O); 
for stage = 1 to n/2t do 

Replace each entry in list with a set of 2* candidate solutions obtained by 
Hensel lifting; 

Calculate the log-likelihood logPr(r|s;) for each entry s; on list; 

Keep the L entries in list having the highest log-likelihoods and delete the 
remainder; 

Output list; 


Then our proposed algorithm is simply this: select at each stage from the 
candidates generated by Hensel lifting those L candidates s* which produce the 
highest values of the log-likelihood logPr(ijsj) as in the equation above. These 
candidates are then passed to the next stage. So at each stage except the first 
we will generate a total of L2 f candidates and keep the best L. We may then 
test each entry in the final list by trial encryption and decryption to recover a 
single candidate for the private key. Pseudo-code for this algorithm is shown in 
Algorithm Q] Note that here we assume there are n/2t stages; this number can 
be halved as in the HS and HMM algorithms. 

Our algorithm has fixed running time 0(L 2*) for each of the n/2t stages, and 
fixed memory consumption 0(L2 t ). This is a consequence of choosing to keep 
the L best candidates at each stage in place of all candidates surpassing some 
threshold as in the HMM algorithm. The memory consumption can be reduced to 
0(L+t) by using a depth-first approach to generating and filtering the candidates. 
The main overhead is then the Hensel lifting to generate candidate solutions; the 
subsequent computation of log-likelihoods for each candidate is relatively cheap. 
Notice that if a = 0 (as in the Z-channel for an idealised cold boot setting), then 
any instance of a 0 — > 1 bit flip is very heavily penalised by the log-likelihood 
statistic - it adds a — oo term to log Pr(ijsj). In practice, for a = 0, we just reject 
any solution containing a 0 — > 1 transition. For the erasure channel, we reject any 
candidate solution that does not match r in the known bits. 

A special case of our algorithm arises when L = 1 and corresponds to just 
keeping the single ML candidate at each stage. This algorithm then corresponds 
to Maximum Likelihood (ML) decoding. However, at a given stage, it is likely 
that the correct solution will be rejected because a wrong solution happens to 
have the highest likelihood. This is especially so in view of how similar some 
candidates will be to the correct solution. Therefore, ML decoding is likely to 
go awry at some stage of the algorithm. 


4.1 Remarks on the Asymptotic Analysis of Our Algorithm 

In the full version, we give two analyses of our algorithm, using tools from coding 
theory to assist us. The first analysis uses a strong randomness assumption, that 
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Table 3. Success probabilities for the symmetric case (( a,/3 ) = (5,5)). Experiments 
with 5 < 0.16 are based on 500 trials. Capacity bound on 5 is 0.243. 


5 

0.08 

0.10 

0.12 

0.14 

0.16 

0.18 

0.19 

0.2 

0.21 

0.22 

t 

6 

8 

10 

12 

16 

18 

18 

18 

18 

18 

L 

4 

4 

8 

32 

32 

32 

32 

32 

32 

64 

Success rate 

1 

0.921 

0.932 

0.963 

0.84 

0.60 

0.38 

0.20 

0.08 

0.04 

Time per trial (ms) 

113 

98 

474 

4323 

85662 

395069 

399451 

380139 

377342 

722341 


the 1/2* candidates s* generated at each stage of Algorithm Q] are independent and 
uniformly random mf-bit vectors. It shows that, asymptotically, our algorithm 
will be successful in recovering the RSA private key provided 1/m is less than the 
capacity of the memoryless, binary channel with crossover probabilities (a,/3). 
In fact, this result follows as a simple application of Shannon’s noisy-channel 
coding theorem m, which states that, asymptotically, the use of random codes 
in combination with Maximum Likelihood (ML) decoding achieves arbitrarily 
small decoding error probability, provided that the code rate stays below the 
capacity of the channel. Unfortunately, it is easy to see that our strong random- 
ness assumption is in fact not true for the codes C generated in our algorithm, 
because of the iterative nature of the Hensel lifting. The second analysis proves 
a similar result for the symmetric case under weaker randomness assumptions 
for which we have good experimental evidence. Details can be found in the full 
version. 


5 Experimental Results 

For our experiments, we used a multi-threaded implementation based on Java 
code kindly supplied by the authors of 0. We ran our code on an 8x virtual 
CPU hosted on a 2x Intel Xeon X5650, clocked at 2.67 GHz (IBM BladeCenter 
HS22V). Except where noted below, our experiments were run for 100 trials using 
a randomly-generated RSA key for each trial. Except where noted, our results 
refer to private keys of the form sk = ( p , q. d. d p , d q ) and are all for 1024-bit RSA 
moduli. 

We have conducted extensive experiments for the symmetric case considered 
in |B) ■ Our results are shown in Table 0 For small values of 6, we achieve a 
success rate of 1 or very close to 1 using only moderate amounts of computation. 
By contrast the HMM algorithm does not achieve such high success rate for 
small 5. This cannot be solved by increasing t in the HMM algorithm because 
this leads to a blow-up in running time. For larger 5, the success rate of our 
algorithm is comparable to that of (Hj for similar values of t. We were able to 
obtain a non-zero success rate for 5 = 0.22, while jHj only reached S = 0.20. The 
bound from capacity is 0.243. 

For the idealised cold boot setting where a = 0, our experimental results are 
shown in Table 0) Recall that the HS algorithm can also be applied to this case. 
Translating the fraction of known bits (1 — p) to the idealised cold boot setting, 
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Table 4. Success probabilities for the idealised cold boot case (a = 0). Capacity bound 
on P is 0.666. 


p 

0.1 

0.2 

0.3 

0.4 

0.46 

0.5 

0.55 

0.6 

0.62 

0.63 

t 

6 

6 

8 

12 

16 

18 

18 

18 

18 

18 

L 

4 

4 

8 

8 

8 

16 

16 

16 

64 

64 

Success rate 

1 

1 

1 

0.98 

0.87 

0.81 

0.43 

0.13 

0.07 

0.03 

Time per trial (ms) 

69 

88 

147 

1518 

22349 

292834 

282235 

290254 

692532 

683421 


Table 5. Success probabilities for the true cold-boot case with a = 0.001. Capacity 
bound on p is 0.658. 


P 

0.1 

0.2 

0.3 

0.4 

0.5 

0.55 

0.6 

0.61 

t 

6 

6 

8 

12 

16 

18 

18 

18 

L 

4 

4 

8 

8 

16 

32 

64 

64 

Success rate 

1 

1 

0.97 

0.97 

0.66 

0.31 

0.09 

0.04 

Time per trial (ms) 

80 

80 

273 

4268 

42732 

384262 

740244 

735169 


and assuming the HS algorithm works just as well when only 1 bits are known 
(instead of a mixture of 0 and 1 bits), the maximum value of #3 that could be 
handled by the HS algorithm theoretically would be 0.46 (though results reported 
in 0 would allow /3 as high as 0.52). Our algorithm still has a reasonable success 
rate for /3 as high as 0.6 and non-zero success rate even for /3 = 0.63, beating the 
HS algorithm by some margin. Our capacity analysis for this case suggests that 
the maximum value of ft will be 0.666. Thus our algorithm is operating within 
5% of capacity here. 


Table 6. Success probabilities for the true cold-boot case with a = 0.001 and sk = 
(p, q, d). Capacity bound on 3 is 0.479. 


0 

0.1 

0.15 

0.20 

0.25 

0.30 

0.35 

0.40 

0.43 

t 

6 

10 

14 

16 

18 

18 

18 

18 

L 

4 

16 

16 

16 

16 

16 

32 

64 

Success rate 

0.99 

0.99 

0.98 

0.96 

0.63 

0.55 

0.12 

0.04 

Time per trial (ms) 

46 

371 

4441 

19906 

117502 

108523 

165418 

301457 


Table 7. Success probabilities for the true cold-boot case with a = 0.001 and sk = 
(p, q). Capacity bound on P is 0.298. 


P 

0.05 

0.1 

0.15 

0.20 

0.26 

t 

10 

12 

16 

18 

18 

L 

8 

8 

16 

32 

64 

Success rate 

0.95 

0.83 

0.68 

0.29 

0.06 

Time per trial (ms) 

404 

904 

9492 

87273 

217214 
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We present experimental results for the true cold boot setting in Table El 
Given a = 0.001, it follows from our asymptotic analysis that the theoretical 
maximum value of 6 which can be handled by our algorithms is 0.658. Our 
algorithm still has a non-zero success rate for /3 as high as 0.61. We reiterate 
that this true cold boot setting is not handled by any of the algorithms previously 
reported in the literature. 

Furthermore, for private keys of the form sk = ( p,q,d ) and sk = (p, q). our 
algorithm performs very well in the true cold boot setting. For sk = (p, q, d), the 
maximum value of /3 suggested by our capacity analysis is 0.479. With #3 = 0.4, 
t = 20 and L = 16 our success rate is 0.12 and we have non-zero success rate 
even with j3 = 0.43. Similarly, when sk = (p, q) our capacity analysis shows that 
the maximum /3 is 0.298. When ,3 = 0.2, t = 18 and L = 16 we still have a 
success rate of 0.29, but we can even recover keys with non-zero success rate for 
/3 as high as 0.26. Tables El and 0 show our results for these cases. 

In the full version, we report further results for the erasure channel that 
improve on the results of jOJ and nearly close the gap to our capacity bound. For 
example, when m = 5, we can achieve reliable key recovery up to an erasure rate 
of 0.79 for this channel, where the bound from capacity is 0.80. By contrast, the 
best result reported in (0j is for erasure rate 0.76. These and other improvements 
are obtained using an optimised ‘C’ implementation of a depth-first search. 


6 Conclusions 

We have introduced an coding-theoretic viewpoint to the problem of recovering 
an RSA private key from a noisy version of the key. This provides new insights 
on the HS and HMM algorithms and leads to a new algorithm which is efficiently 
implementable and enjoys good performance at high error rates. In particular, 
ours is the first algorithm that works for the true cold boot case, where both 
Pr(0 — > 1) and Pr(l — > 0) are non-zero. Our algorithm is amenable to asymptotic 
analysis, and our experimental results indicate that this analysis provides a good 
guide to what is actually achievable with reasonable computing resources. Open 
problems include: 

1. Developing a rigorous asymptotic analysis of our algorithm in the general 
case. However, in view of the state-of-the-art in list decoding, this seems to 
be hard to obtain. 

2. Generalising our approach to the situation where soft information is available 
about the private key bits, for example reliability estimates of the bits. In 
general, and by analogy with the situation in the coding theory literature, one 
would expect to achieve better performance by exploiting such information. 
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Abstract. We propose an algorithm that, given an arbitrary N of un- 
known factorization and prime e > A r i +e , certifies whether the RSA 
function RSA:v ie (x) := x e mod N defines a permutation over Jj* n or not. 
The algorithm uses Coppersmith’s method to find small solutions of poly- 
nomial equations and runs in time 0(e -8 log 2 N). Previous certification 
techniques required e> N. 

Keywords: RSA, certified trapdoor permutations, Coppersmith. 


1 Introduction 

One of the most well known cryptographic primitives is the RSA function 1251 - 
Given a public modulus N (which is usually the product of two primes) and 
an exponent e, it is defined as RSAjv, e : — > Z* N , x i-»- x e mod N. It is well 

known that the RSA function defines a permutation over the domain h* N iff 
gcd(e, ip(N)) = 1. Furthermore, with the right choice of parameters, the RSA 
function even defines a trapdoor permutation since the prime factorization of N 
allows to efficiently invert RSAjv, e - 

Trapdoor permutations have many applications to public-key cryptosystems 
and serve as a building block for (often quite complex) cryptographic proto- 
cols. In a large number of applications of trapdoor functions, the fact that the 
function is a permutation is required to be publicly verifiable. The importance 
of trapdoor permutations with an efficient permutation checking procedure was 
first noted by Bellare and Yung 00 , who called them certified trapdoor permu- 
tations. Certified trapdoor permutations are in particular important in scenarios 
where one party (for example, the prover) sends a description of a trapdoor per- 
mutation to another party (for example, the verifier). A dishonest prover may 
send a malicious description of a trapdoor function which is not a permuta- 
tion. If this remains unnoticed by the verifier, it may allow the prover to cheat 
in the protocol. See Section II .21 for a list of applications of certified trapdoor 
permutations. 

RSA as A CERTIFIED trapdoor permutation. The question whether the 
RSA function is a certified trapdoor permutation was first addressed by Bellare 
and Yung who wrote in 00: 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 404- gTT] 2012. 

(c) International Association for Cryptologic Research 2012 
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In particular, RSA is (probably) not certified [...]. This is because [...] 
the (description of) the trapdoor permutation f includes a number which 
is a product of two primes, and there is (probably) no polynomial time 
procedure to test whether or not a number is a product of two primes. 

To overcome this problem, Bellare and Yung showed that every trapdoor permu- 
tation can be transformed into a certified trapdoor permutation by presenting 
pre-images (under the function) of random elements specified in a common refer- 
ence string (CRS), hence certifying that the function is (almost) a permutation. 
While this result is certainly interesting at a theoretical level, the Bellare- Yung 
transformation has two main disadvantages. First, it comes with an additional 
computational overhead (consisting of a number of evaluations of the function) 
and is therefore relatively inefficient. Second, in order to keep the same data 
structures one would rather prefer that the initial trapdoor function (e.g., RSA) 
can be certified directly, without any additional overhead such as a CRS or 
pre- images. Related transformations for RSA were proposed in jl 41617 ] . 

Subsequently, two results were obtained about the direct certffiability of RSA, 
i.e., without using a CRS and expanding the public description. First, |5I20| 
observed that if e > N and e is prime, then the RSA function RSA,y e is a 
certified permutation. (This is, since if e is a prime, then it can never divide 
tp(N) < A and hence gcd(e. y?(A)| = 1.) However, choosing e > A is usually 
avoided in practice due to the costs for modular exponentiation. Second, Kiltz 
et al. H9 noted that if e < AT 1 / 4 , then RSAjv, e is a lossy trapdoor permutation 
m (under the phi-Hiding Assumption U) and hence it cannot be certified. 
This is because a lossy trapdoor permutation is in some sense the opposite 
of a certified trapdoor permutation: a honestly generated (A, e) with N = pq 
and gcd(e, ip(N)) = 1 cannot be efficiently distinguished from (A, e) for which 
RSAjv, e is many- to- 1 and hence not a permutation. 

To summarize, if e < A 1 / 4 , then the RSA function is lossy and cannot be 
certified (unless the phi-hiding assumption is wrong); if e > A, then it is certified 
0213 ; if A 1 / 4 < e < A, nothing is known and therefore generic Bellare-Yung 
NIZK proofs P] have to be added to certify RSA. 

1.1 Our Results 

In this work we close the above gap by showing an efficient certification proce- 
dure that works for any prime exponent e > A 1 / 4 . Concretely, we construct an 
algorithm that, given an arbitrary modulus A (with unknown factorization) and 
a prime e > A 1 / 4 "*"®, returns 1 iff RSAjv, e defines a permutation over Z* Y . The 
running time of the algorithm is 0(e~ s log 2 (A)) bit operations plus additional 
0(log 4 A) if e needs to be checked for primality. 

Our Certification Algorithm. The idea of our new certification algorithm 
is as follows. The RSAjv, e function defines a permutation over Z* N iff e does not 
divide y>(A). Hence given A, e, our goal is to identify if gcd(e, <p(N)) = 1 or 
not. First, we use Coppersmith’s algorithm |8I2 1 j to find prime divisors p of A 
in a specific range. Concretely, our algorithm Find Factor run with parameter /3 
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successfully identifies if a given prime e > A 1/,4+e divides p — 1 iff there exists 
a divisor p of A in the range [N^,N^ +V 4 + e ]. if we could assume that N = pq 
is the product of two primes, both of size roughly TV 1 / 2 , then we could run 
FindFactor with parameter Jf = 1/2 to identify whether e divides tp(N) or not. 
However, the certification algorithm has to view A as an arbitrary integer with 
unknown factorization. If A = pq with p « A 2 / 3 and q « A 1 / 3 , then FindFactor 
run with parameter 0 = 1/2 does not work any more. To get around this, we run 
the FindFactor algorithm multiple times (with different parameters 0) to check 
for various ranges of the prime factors of A. Our main technical contribution is to 
show that the number of invocations of FindFactor in our certification algorithm 
is poly(e) if e > IV 1 / 4+e . 

Extensions. Our certification algorithm works only for prime e but it can be 
extended to the case where the factorization of e = ]/[ e Zi is known. In that case 
we can give an efficient certification procedure if e, > 7V 1 / 4+e , for all i. If, for 
one i, we have e% < IV 1 / 4 , then RSA,v. e is (at least) e,;-to-l (lossy) under the 
phi-hiding assumption. Extending our methods to work with arbitrary integers 
e of unknown factorization remains an open problem. 


1.2 Certified Trapdoor Permutations and Applications 

The only known candidate trapdoor permutations are the (factoring-based) 
Blirm-Blum-Shub permutation £1] , the RSA permutation m, and Paillier E33- 
Since the Blum-Blum-Shub function is lossy assuming one cannot distinguish 
A = pq from A = pqr |22I1 2j , the RSA trapdoor function is the most efficient 
certified trapdoor permutation currently known. Our results show that one can 
use RSA with prime e = A 1 / 44 " 5 (rather than e > A) as a certified trapdoor 
permutation. 

We now mention a number of cryptographic protocols that are using certi- 
fied (rather than standard) trapdoor permutations as a building block. Most 
importantly, NIZK protocols for any NP-statement can be built from (doubly- 
enhanced) certified trapdoor permutations jl 111 711 5lT5j . Since the RSA trap- 
door permutation is doubly-enhanced m we obtain simplified and more efficient 
NIZK protocols from the RSA assumption (with e > A 1 / 4 ), that do not suffer 
from the Bellare-Yung certification overhead. Apart from that, m used certified 
trapdoor permutations to construct ZAPS and verifiable PRFs; m to construct 
round-optimal blind signatures; |20ll| to build sequential aggregate signatures. 
We stress that requiring the trapdoor permutation to be certified is not only 
an artifact of the security proofs. In almost all cases the use of a lossy trapdoor 
permutation leads to a concrete attack on the scheme. For example, the security 
of the RSA-based aggregate signatures scheme of 1211 can be broken (assuming 
the Phi-Hiding Assumption) when instantiated with e < A 1 / 4 (e.g., using the 
common choices e = 3 or e = 2 16 + 1). The same holds for the NIZK proto- 
cols for any NP statement m Recently, m showed that a full-domain hash 
impossibility result by Coron jOj only holds if the trapdoor function is certified. 
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2 Definitions 

2.1 Notation 

We denote our security parameter as k. For all n G N. we denote by 1" the n-bit 
string of all ones. For any element r in a set S, we use x £ R S to indicate that 
we choose x uniformly at random from S. We denote the set of prime numbers 
by P and the set of n-bit prime numbers by P„. We denote by Z* N = [x £ Zjv : 
gcd(a:. N) = 1} the multiplicative group modulo an integer N. All logarithms 
are base 2 unless otherwise stated. 

2.2 Families of Permutations 

Definition 1. A family of permutations P = (Gen, Eval) consists of the following 
two polynomial-time algorithms. 

1. A probabilistic algorithm Gen, which on input l fc outputs a public description 
pub which includes an efficiently sampleable domain Dom pu f,. 

2. A deterministic algorithm Eval, which on input pub andx £ Dom pu b, outputs 
y £ Dom pu b. We write f(x) = Eval (pub,x). 

We require that for all k £ N and all pub output by Gen(l fc ), Eval {pub, •) defines 
a permutation over Dom^. 

Definition ^extends to families of trapdoor permutations, where Gen additionally 
outputs a trapdoor trap which can be used by a deterministic polynomial-time 
algorithm Invert to compute f~ 1 (y), for any y £ Dom pu j,. 

We want to point out that Eval(pit&, •) is only required to be a permutation 
for correctly generated pub but not every bit-string pub yields a permutation. 
A family of permutations II is said to be certified JSj if the fact that it is a 
permutation can be verified in polynomial time given pub. 

Definition 2. CP = (Gen, Eval, Certify) is called a family of certified permuta- 
tions if (Gen, Eval) is a family of permutations and Certify is a deterministic 
polynomial-time algorithm that, on input of l k and an arbitrary pub (poten- 
tially not generated by Gen ), returns 1 iff Eval (pub, ■) defines a permutation over 
Dom pn 5 . 

Definition |2| also extends to families of certified trapdoor permutations. 

We remark that Definition El follows m and is slightly weaker than that of 
Bellare and Yung 0, where, for all inputs, the Certify algorithm is required to 
return 1 iff pub was generated by Gen(l fc ), with some constant error probabil- 
ity (in the sense of a BPP algorithm) Q In fact, it seems that the certification 

1 The difference between the two definitions can be explained for the case of RSA. Sup- 
pose the original Gen algorithm outputs pub = (N = pq, e) with gcd(e, <p(N)) = 1. 
This cannot define a certified permutation with respect to the Bellare- Yung def- 
inition since if pub' = (N' = pqr, e ') with gcd(e', p(N')) = 1) then pub sa pub 1 
under the 2-vs-3 prime assumption but pub' is never output by Gen. However, since 
gcd(A',e') = 1, RSAjv', e ' defines a permutation so there is some hope that it still 
meets Definition |3 
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constructions by Bellare and Yung (3 Section 3] only meet our weaker defini- 
tion which is, in particular, sufficient for their applications to NIZK for all NP 
languages. 

2.3 RSA Trapdoor Permutation 

In Figured we give a description of a family of trapdoor permutations RSA 7 = 
(RSAGen 7 , RSAEval, RSAInvert), parametrized by some function 7 > 0 (which 
controls the size of the exponent e « JV 7 ). The domain is defined as Dom ptt & = 
Z* N . 


algorithm RSAGen 7 (l fc ) 

algorithm RSAEval (p/c, x) 

algorithm RSAInvert(td, y) 

P, Q £ r Pfe / 2 

N = pq 
repeat 

e £ h P 7fc 

until ( gcd(e,g>{N )) = 1) 
d = mod tp(N) 

return ( pk = ( N , e),td = d) 

return x K mod N 

return y d mod N 


Fig. 1. RSA permutation algorithms 


3 RSA Certification Algorithm 

In this section we will give a certification algorithm for the RSA trapdoor permu- 
tation RSA 7 from Section 12.31 Our algorithm can be derived from the following 
main theorem. 

Theorem 3. Let N be an integer of unknown factorization and e < N be a 
prime integer such that 7 = logjy e = \ + e and gcd(e, N) = 1 . We can decide if 
gcd(e,(p(N)) = 1 or gcd(e,(p(N)) = e in time 0(£ - 8 log 2 N). 

Proof. Let us write N = pf . with prime p t . Therefore, 


v( n ) = X\pT 1 (Pi- !)• 

Since e is prime, we can only have gcd(e, <p(N)) - 1 or gcd(e, p(N)) = e. In the 
last case, we must have e\ip(N). If e > N then we know that gcd(e, ip(N)) = 1 
EDI- When e < N, then we need to perform some further checks. 

Let us look at the case e\ip(N). If e|pf <_1 then gcd(e, N) = e, which contradicts 
the prerequisite that e and N are coprime. Hence we must have e| (p,; — 1) for 
some i. Let us denote p = Pi- There exists an xq G N s.t. 


exo + 1 = p. 
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Our goal is to recover xo and thus to find p. Notice that xo is a small root of the 
polynomial equation f(x) = ex + 1 modulo p. 

This allows us to use Coppersmith’s algorithm for finding small roots of mod- 
ular polynomial equations. 


Theorem 4 (Coppersmith). Let N be an integer of unknown factorization, 
which has a divisor p > N&, 0 < /3 < 1. Let 0 < p < y/3. Furthermore, let f(x) 
be a univariate monic polynomial of degree 6. Then we can find all solutions xo 
for the equation: 

f(x o) = 0 modp with |:ro| < M 

This can be achieved in time 0(p~ 7 S 5 log 2 N). The number of solutions xo is 
bounded by 0(p~ 1 6). 

A proof can be found in m 

We use Coppersmith’s algorithm to find prime divisors p of N in a specific 
range as specified in the following lemma. 

Lemma 5. Let N be an integer of unknown factorization with divisor p > N 13 
for some 6 (0, 1]. Let p £ (0, ^]. Further, let e = N 7 with e\p— 1. Then there is 
an algorithm FindFactor that on input N, e, /3, p outputs p in time 0(p~ 7 log 2 N) 
provided that 

p < 

If FindFactor cannot find a non-trivial factor of N, it outputs _L. 


Proof. Since e\p — 1 . we have exo = p — 1 for some xq £ N. Thus the polynomial 
f(x) = ex + 1 has the root xq modulo p. Multiplication of f(x) by e _1 modulo 
N gives us a monic polynomial with the same root modulo p. Let us bound the 
size of our desired root xq. We have 


- = N^~ 


Thus we can recover Xq by Theorem0|in time 0(p~ 7 log 2 N). Also by Theorem^ 
the number of candidates for x$ is bounded by 0(p~ 1 ). For every candidate we 
check whether gcd(ea;o + 1, N) gives us the divisor p. This can be done in time 
0{p~ x log 2 TV), which concludes the proof. 


Lemma 0 can be used to check whether e\p — 1 for some prime divisor p in the 
range [N 13 , N 13 Our goal is to check whether e|p — 1 for some p in the 

entire range [e, JV], which we will call the target range. 

Obviously p< N. Thus, we can set the upper bound to /3 2 + 7 — p = 1. This 
in turn implies a lower bound of (3 = i/l — (7 — p). Hence, we can first search 
for a divisor p in the interval TV]. If we do not find a divisor p in 

this interval, then we know that any divisor p must satisfy p < nV 1- ^ 7- ^. This 
defines a new upper bound, and in turn a new lower bound. 
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In total, we cover the target range by a sequence of intervals [N^ 1 , N ^°], . . . , 
[N^ n ,N^ n - 1 ] where the pi are defined by the recurrence relation 


fii + 1 = max{ y/ Pi - (7- /i) ,7} with Po = 1. 


Two examples of such an interval sequence are illustrated in Figure El 

The following lemma shows that our recurrence reaches 7 and thus covers the 
target range [e, IV] after a certain number of steps. 

Lemma 6. Let ^ < 7 — p < 7 < 1 . Then the recurrence relation 


Pi+ 1 = max{ yf Pi - (7- A«), 7} with Po = 1 



satifies Pk = l for some k < 


+ 1 . 


Proof. Since by definition 7 < Pi <1 for all i and p > 0, we have A — (7 — p) > 0 
and therefore y/ Pi — { 7 — p) is defined in R. 

We now show by induction that the sequence of the Pi is monoto ne decreasing . 
Let us start with pi < po- Since Po — (7— p) < 1 , we have max{-^// 3 o — (7 — p), 
7} < 1 and therefore pi < Pq. 

Our inductive hypothesis is Pi < / 3 j_i for all i < n. Now P n < P n -\ implies 


fin (T AO — fin - 1 - (7 - M) 


and therefore by monotonicity of the square root function 


V fin - (7 - M) < V fin-1- (7 -M)- 


This yields max{-\// 3 „ — (7 — p),7} < max{^// 3 „_i — (7 — p),7}- Thus, / 3 „+i < 

fin- 


Since the sequence of the Pi is monotone decreasing and bounded below by 7, 
it converges. Now we show that we can upper bound the number k— 1 of intervals 
[pi, Pi- 1], 1 < i < k for which Pi > 7. This implies that our sequence stabilizes 
after k steps at the point pk= 7. 

Let us define a function A(Pi-i) = Pi- 1 —fii> 0 , which gives us the length of 
the i th interval. For Pi > 7 we obtain A(Pi-i) = Pi - 1 — y/ pi - 1 — (7 — p). Since 
the first two derivatives of A(P) satisfy 


A'ifi) = 1 - \{fi ~ (7 - M)) * and Zi"(/ 3 ) = \{P - (7 - p)) f > 0 , 


an easy computation shows that achieves its minimum at the point P^ = 
\ + 7 — p. Therefore, each interval length is of size at least 


A(pW) = 7-P— 


This in turn means that the number k — 1 of intervals with Pi > 7 is at most 



which concludes the proof. 
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0 \ 7 = ft ft ft 1 









0 \ 7 = A ft ft @2 Pi 1 

Fig. 2. Values Obtained for (7 = 0.43, p = 0.06) and (7 = 0.365, p = 0.05) 


We continue the proof of Theorem 0 We can use the algorithm FindFactor 
from Lemma El with the parameters (IV, e, ft,/z) to test if there is any factor p 
such that e|p— 1 in the sub-range If we run FindFactor multiple 

times with the Pi values computed using the relation in Lemma El we can test 
the entire range as required. 

We now discuss the choice of the parameter p. LemmaElgives us the condition 
p 5 = ft /7 for all values of i. We know from the proof of Lemma El that 7 < ft 
for all values of i. Hence it is sufficient to pick p such that p < 'y/ 7 . 

Furthermore, from Lemma El we have the condition p < 7 — j . It is easy to 
verify that both conditions 


p ^ 7/7 and p < 7 — \ 

are satisfied by the choice p := — j) = y£ for all 7 > 

We give the whole algorithm GCDDecide for deciding whether gcd(e, <j>{N)) = 
1 in Figure El 

It remains to determine the running time tccDDedde of GCDDecide. We know 
from Lemma El that we need at most |"(1 — 7)/ (7 — p — 7)] +1 iterations of 
FindFactor, which can be bounded as 


algorithm GDCDecide(iV, e) 

algorithm RSACertify(A r , e) 

if (e > N) then return 1 

7 = log Ar e, e = 7 - \ 
if (e < 0) then return T 
p=\e 

A = M = o 

while (ft >= 7 ) 

if (FindFactor(A, e, ft, At) 7^ T) return e 
i + + 

Pi = max-iyA-i - (7 - p ), 7 } 

return 1 

if (!PRIME(e)) then return T 
if ( gcd(e,N)\ = 1) then return T 
if (GCDDecide(lV, e)l = 1) 
then return false 
else return true 


Fig. 3. GCD Decision and RSA Certification algorithms 
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7 * Zi 'j 41 <[ ^t] + 1 = O^" 1 ) = 0(e '). 

Since each iteration takes time 0(pr 7 log 2 N). we obtain 

iGCDDecide = 0(p~ 8 log 2 N) = 0(e~ 8 log 2 N). 

This concludes the proof of Theorem 0 

We now describe our full certification algorithm RSACertify that certifies the 
RSA trapdoor permutation RSA 7 from Section 12 .: SI for 7 = 1/4 + e. Note that 
we assume in Theorem 0 that e is prime and that gcd(e,N ) = 1. If we want 
to check these prerequisites, we have an additional overhead of O (log 4 IV) for 
the primality test on e and 0(log 2 N) for the GCD computation. The complete 
certification algorithm RSACertify is described in Figure 0 The total running 
time of RSACertify, denoted by iRSACertify, is given by the expression 

f RSACertify = 0(log 4 N) + 0(log 2 N) + fGCDDecide 

= 0(max{log 4 N, s~ 8 log 2 IV}). 

Let CRSA 7 = (RSAGen 7 , RSAEval, RSAInvert, RSACertify), as described in Fig- 
ures m and 01 where 7 controls the size of e « IV 7 . By Theorem 0 we can see 
that, for any 7 = 1/4+ l/poly(k), CRSA 7 defines a family of certified trapdoor 
permutations with respect to Definition |3 
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Abstract. Many lattice cryptographic primitives require an efficient al- 
gorithm to sample lattice points according to some Gaussian distribution. 

All algorithms known for this task require long-integer arithmetic at some 
point, which may be problematic in practice. We study how much lattice 
sampling can be sped up using floating-point arithmetic. First, we show 
that a direct floating-point implementation of these algorithms does not 
give any asymptotic speedup: the floating-point precision needs to be 
greater than the security parameter, leading to an overall complexity 
0(n 3 ) where n is the lattice dimension. However, we introduce a laziness 
technique that can significantly speed up these algorithms. Namely, in 
certain cases such as NTRUSign lattices, laziness can decrease the com- 
plexity to 0(n 2 ) or even O(n). Furthermore, our analysis is practical: for 
typical parameters, most of the floating-point operations only require the 
double-precision IEEE standard. 

1 Introduction 

Lattice-based cryptography has been attracting considerable interest in the past 
few years (see the survey |Z2!), due to unique features such as security based on 
worst-case assumptions j3j or more recently fully-homomorphic encryption m 
But it has several differences compared to classical public-key cryptography 
based on factoring and discrete logarithms: in particular, the description of 
many lattice schemes (such as the seminal Ajtai-Dwork cryptosystem j2j and 
its LWE variants or schemes using lattice sampling B3) involves real num- 
bers at some point. Although the descriptions usually mention that one can 
replace these real numbers by approximations with sufficiently high precision, 
which guarantees efficiency in an asymptotical sense, the practical impact is un- 
clear: no article seems to specify exactly which precision one should take, and 
how all the operations will be performed exactly. This was not an issue when 
lattice-based cryptography was considered to be mostly of theoretical interest, 
but recent works |2 212 fill 712 811 812 flj suggest that the time has come to assess the 
practicality of lattice-based constructions. 

There is another reason to study carefully the use of floating-point arithmetic 
in lattice-based cryptography. Many recent lattice schemes ( e.g . trapdoor sig- 
natures |12l(il and ID-based encryption fl 2171 I 12 j ) require a Gaussian sampler, 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 415-g32] 2012. 
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that is an efficient algorithm to sample lattice points according to a Gaussian- 
like distribution, given a (short secret) basis and a target vector. There are two 
approaches for this task: Klein’s randomized variant HU (as analyzed by Gentry 
et al. na) of Babai’s nearest plane algorithm jSj , and algorithms [26121 )[ based 
on convolution for the so-called q - ary lattices. 

The cost of Klein’s algorithm is the same as Babai’s algorithm, namely 
0(n 3 log B) (or 0(n 4 log 2 B) without fast integer arithmetic), where n is the 
lattice dimension, and B is the maximal norm of the input basis vectors: since 
B is polynomial in n for trapdoor bases used in lattice cryptography, the usual 
cost is 0(n 3 ) (or (D(n A ) without fast integer arithmetic). The main reason be- 
hind the cost of Klein’s algorithm is the use of long- integer arithmetic: it relies on 
Gram-Schmidt orthogonalization, which involves rational numbers of bit-length 
0(n log B). A natural way to improve the efficiency is to use floating-point arith- 
metic (FPA) to replace exact Gram-Schmidt by suitable approximations. Indeed, 
Klein’s algorithm is a variant of Babai’s nearest plane algorithm, which itself is 
simply the size-reduction subroutine used extensively in the LLL algorithm na; 
and floating-point arithmetic is classically used to speed up LLL (see [21)12612.'*!) 1 . 
But the use of FPA is not straightforward, and it is unclear at first sight how 
much speed up can be gained, if any. 

On the other hand, the convolution algorithms [26I2()[ based on Peikert’s 
work m have two phases: an offline phase (depending on the secret basis only) 
and an online phase (depending on the target vector). The online phase costs 
0(n 2 ) for r/-ary lattices (which are widespread in lattice cryptography), or even 
0(n) in the so-called ring setting (i.e. special lattices such as NTRU lattices); 
but the offline phase is the generation of a noise following some discrete Gaussian 
distribution, which seems to have the same cost <D(n 3 ) as Klein’s algorithm, and 
involves floating-point arithmetic whose exact cost is not analyzed in eeeh. 
Both algorithms [26121 )j can use the same offline phase, which will later be re- 
ferred to as Peikert’s offline Algorithm. 

It should be stressed that the offline phase is not a precomputation: this 
phase must be repeated before each sampling, which is reminiscent of DSA one- 
time pairs (k, A; -1 ), which can be precomputed as coupons or generated online; 
but unlike a precomputation it should not be re-used. In some scenario, this 
computational cost might be acceptable, but it is clearly valuable to analyze 
and improve the offline phase. 

Our results. We develop techniques to improve all three samplers, obtaining the 
first algorithms with quasi-optimal complexity to sample the discrete Gaussian 
distribution over lattices: their running time is quasi-linear in the size of the input 
basis. More precisely, our optimized variant of Klein’s algorithm runs in 0(n 2 ) (for 
certain bases) and our variant of Peikert’s offline algorithm runs in average time 
0(n ) in some ring setting (where n is the lattice dimension). In both cases, our 
improvements do not introduce any loss of quality. 

To do so, we study how much lattice sampling can be sped up using FPA. As 
a starting point, we present FPA variants of Klein’s algorithm with statistically 
close output. Surprisingly, the basic FPA variant has the same asymptotical 
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complexity 0(n 3 ) as Klein’s algorithm, because the precision needs to be greater 
than the security parameter. However, we also present an optimized algorithm 
with an improved complexity 0(n 2 ): it is based on a so-called laziness technique 
which combines high and low precision FPA. But this optimized complexity only 
applies to a special class of bases which include NTRUSign bases [EH, namely 
the inverse basis must be small. 

Next, we show that the same optimization can be used to speed up Peikert’s 
offline algorithm, improving the total complexity, to bring its offline complexity 
down to that of its online complexity for both sampling algorithms of 1291201 . 
More precisely, we apply our laziness technique to reduce the offline complexity 
to O (n 2 ) . And for certain ring settings (precisely when the ring is 1Z = X h ±l), we 
show that the offline phase can also be sped up to average quasi-linear time. This 
is achieved by using two additional tricks: a structured square-root algorithm and 
an improved rejection sampler for Gaussians over Z. 

As a direct application of this last result, one can strengthen the security of 
NTRUSign jE3 by replacing their heuristic perturbation technique with our 
optimized sampler, without any loss of efficiency asymptotically. This prevents 
learning attacks [2411 l)j on NTRUSign as the signature scheme is now provably 
secure in the random-oracle model (see [El), under the (reasonable) assumption 
that finding close vectors in NTRUSign lattices is hard. 

While numerical analysis has often be used [2 912 512 3 j to speed up lattice 
reduction algorithms in a rigorous way, our work might be its first application 
to provable security. 

Practical impact of laziness. The precision used for floating-point arithmetic has 
non- negligible practical impact, because fp-operations become much more expen- 
sive when the precision goes over the hardware precision. For instance, modern 
processors typically provide floating-point arithmetic following the double IEEE 
standard (53-bit precision), but quad-float FPA (113-bit precision simulated by 
software libraries) is usually about 10-20 times slower for basic operations, and 
the overhead is much more for multiprecision FPA. 

Our complexity results are stated in an asymptotical manner, but our analysis 
can give concrete bounds (which are provided in the full version 0). It turns 
out that in typical cryptographic settings, the double-precision (53-bit) IEEE 
standard can be selected as the “low precision” of our lazy algorithm, which 
means that most of our fp-operations are hardware fp-operations, even though 
the security level is not limited to 53 bits. 

Roadmap. We start in Sect.Qwith background and notation on lattices, sampling 
and FPA. In Sect.0 we present our basic FPA variant of Klein’s algorithm, which 
we optimize using laziness in Sect. El In Sect. 0 we apply laziness to speedup 
Peikert’s Offline Algorithm. Eventually, in Sect . 0 we explain how to reach quasi- 
linear time complexity in the ring setting. Missing proofs and additional details, 
such as non-asymptotic bounds can be found in the full version 0. 
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2 Preliminaries 

Throughout the paper, we use row representations of matrices (to match lat- 
tice software), and use bold fonts to denote vectors: if B = (bi, . . . , b„) is a 
matrix, then its row vectors are the bj’s. Notation A4 n and <S+ denote respec- 
tively the square matrices, and the square symmetric definite positive matrices 
of dimension n over M. 


2.1 Notation 

Lattices. Lattices are discrete subgroups of R m . A lattice L is represented by 
a basis, that is, a set of linearly independent vectors bi,...,b„ in such 
that L is equal to the set L(b-| , . . . ,b„) = T:b,;, x t G Z} of all integer 

linear combinations of the b, 's. The integer n is the dimension of the lattice L. 
The volume vol (L) is the n-dimensional volume of the parallelepiped generated 
by any basis of L. In lattice-based cryptography, one mainly uses the so-called q- 
ary lattices, which include NTRU lattices jl 4tl and Ajtai’s worst-case/average- 
case lattices 0- A q-ary lattice is simply a full-rank integer lattice L C Z" such 
that gZ" C L, where q is a somewhat small integer. For such a lattice, vol(L) 
divides q n . 

Norms. For a vector x G R n , ||x|| = (x, x) will denote its Euclidean norm. 

The norm of a matrix B is the maximal norm of its rows: ||B|| = max" =1 ||b* || . 
The spectral norm of a square n X n matrix M is: ||M|| S = max xeR n/{ 0 } . 

Orthogonalization. An n x m basis B = (bi, . . . , b n ) can be written uniquely as 
B = p ■ D ■ Q where p = ipi.j) is an n x n lower-triangular matrix with unit 
diagonal, D an n-dimensional positive diagonal matrix and Q an n x to matrix 
with orthonormal row vectors. Then pD is a lower triangular representation of 
B (with respect to Q), B* = DQ = (b*, . . . , b*) is the Gram-Schmidt orthogo- 
nalization of the basis, and D is the diagonal matrix formed by the ||b*||’s. With 
those notations, we have fiij = (b* , bt)/||b*|| 2 . 

For any o > 0, we let cq = a/ ||b*|| and a = max" =1 <r t . Since the b*’s are 
orthogonal, we have a = a/(rairi™ =1 ||b*||) = u||i3* _1 || s < a ||R _1 || s ||/r|| s < 
cr ||R- 1 || s n/( where fi > 1 upper bounds the coefficients of ji. 

Gaussian Distribution. The (unnormalized) weight of Gaussian distribution of 
parameter a G M and center c G K at r G 1 is defined by p aiC (x) = exp ( — 
7 t ^ z ~ 2 C ^ ), and more generally by c (x) = exp (— 7r ^ x ~ 2 c ^ ) for c, x G R n . The 
discrete Gaussian distribution over Z is defined by Dz,o-,c(^) = Pa,c(x) / p CTl c(Z) , 
and more generally, over a lattice L by D i)(7)C (x) = p 0 - ;C (x)/p (T]C (L). Peikert |2S| 
generalized the discrete Gaussian distribution over a lattice L using a posi- 
tive definite matrix S > 0 (which generalizes cr G K) as follows: the density 
Dj c (x) is proportional to pi.o((x — c )B~ 1 ) where E = B f B, for x G L. 


Faster Gaussian Lattice Sampling 


419 


2.2 Gaussian Lattice Sampling 

The goal of Gaussian lattice sampling is to efficiently sample lattice points 
according to a distribution statistically close to DL, a ,c- All lattice samplers 
known | 1 51 1 2120121 )j have constraints on the parameter o and the statistical dis- 
tance, which are related to the so-called smoothing parameter. The sampling 
parameter a determines the average distance of the sampled lattice point to the 
target point: the smaller o, the better for cryptographic applications. For in- 
stance, a impacts the verification threshold of lattice-based signatures m and 
therefore the security of the scheme; a lower quality forces to increase lattice 
parameters. And for a security level of A bits, we need a statistical distance less 
than 2 - \ 

Smoothing Parameter. For any n-dimensional lattice L and any real i > 0, 
the smoothing parameter rj^L) (see [21 \ 1 is the smallest real s > 0 such that 
p 1/s (L*\{0}) < l. where L* is the dual lattice of L. For details on the importance 
of this parameter, please refer to [21I12[ . 

Klein’s sampling. Gentry et al. showed in H3 that given as input a lattice 
basis B of an n-dimensional lattice L such that a > ||-B*||a;(\/logn), Klein’s 
algorithm Q5I outputs lattice points with a distribution statistically close to 
Dl,< 7 , c( x ). For applications, it is more convenient to have a concrete bound on 
the statistical distance, and to separate this bound from the lattice dimension 
n. We therefore use the following concrete analysis of Klein’s algorithm: 

Theorem 1 (Concrete version of |T21 Th. 4.1]). Let n, A £ N be any positive 
integers, and l = 2~ x /(2n). For any n-dimensional lattice L generated by a 
basis B £ Z nXn , and for any target vector c £ Z lxn , Alg. $ is such that the 
statistical distance A(DL t(7tC , SampleLattice 00 (i3, cr, c)) is less than 2 _ , under 
the condition: 

o > ||-B*|| p t (Z) where rj L (Z) ^ y / (Ahi2~dGn"n)77r . 

Tailcut. We will also use a tailcut parameter r, chosen such that (informally) 
a sample from a normal distribution of parameter o is at distance at most to 
from the center with overwhelming probability: 

Corollary 1 (Tailcut error, Corollary of [211 Lemma 2.10] ). Let L be 

an n-dimensional lattice, i < 1/2, a > r] 0 (L), r > 1 5 T G (0,1) and c e R n . 
For x <- Dl, <t, c we have: Pr [||x — c|| > (1 — 8 t )to\ < 3E ta iicut ( t, S T ) n where 
Ataiicut ( t, Sr) = Ty/2re ■ e-’di-MV 


2.3 Floating-Point Arithmetic 

We consider floating-point arithmetic (FPA) with m bits of mantissa, which we 
denote by FP m : the precision is e = 2 -m+1 . A floating-point number / £ FP m is a 
triplet / = ( s , e, v ) where s £ {0, 1}, e £ Z and v £ which represents the 
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real number R(f) = (— l) s • 2 e m ■ v € M. Every FPA-operation o g {+, — , x, /} 
and its respective arithmetic operation on 1, o g {+, verify: 

V/r, h 6 | R(hof 2 ) - (R(f,) o R(f 2 ))\ < (Rif,) o R(f 2 ))e (1) 

We require a floating-point implementation of the exponentiation function exp(-) 
and we assume that it verifies a similar error bound: for any / g EP m , 
|i?(exp(/)) — exp(i?(/))| < e. Finally, we note that if an integer x € Z veri- 
fies |a;| < 2 m , it can be converted to a float / g HP m with no error, i.e. R(/) = x. 
For the rest of the article, we omit the function R and consider EP m as a subset 
of R. 


2.4 Pseudo-code 

Types. Variables are typed, and the type is given at each initialization and 
assignment, as follows: variable value : type. We use a simpler syntax for the 
definition of local functions: {variable 1-4 value}. Functional types are denoted 
by (fy -4 t 2 ). 

Primitives. We use the basic arithmetic operations {+, — , -, /}, as well as squar- 
ing D 2 and exponentiation exp; the arguments are either integers in Z, or 
floating-point numbers in FP m . We extend these notations to vectors and matri- 
ces. We also use the following additional primitives: 

Randlnt(a, b) : ZxZ -> Z : return a random uniform integer in the range [a, 6]. 
RandFloat m () : void — > FP m : return a random uniform float in the range [0, 1). 
ExtRandFloat m / /m (r) : FP m / -4 FP m : return a random uniform floating-point 
number in the range [r, r+2 -m ). For a random r <— Rand Float,,,/ (), the output 
follows the same distribution as RandFloat m (). 

3 A Basic Floating-Point Variant of Klein’s Algorithm 

3.1 Description 

Algorithm 0 describes both Klein’s algorithm | l Fij and our basic floating-point 
variant: given a basis B of a lattice L, a target c and a parameter a. the al- 
gorithm outputs a vector with distribution statistically close to Dj j CT C . It uses 
two subroutines: DecomposeGS m (Alg. 0 to compute the coordinates tfs of 
the target vector c with respect to the Gram- Schmidt basis B*, and SampleZ m 
(Alg. HJ to sample according to the Gaussian distribution over Z. Algorithm |21 
comes in two flavors: 

— SampleLattice^ is the exact version, which corresponds to Klein’s original 
algorithm H3- The m t j ’ s and the tfs are represented exactly by rational 
numbers, and all the computations use exact integer arithmetic. Assuming 
a e Q, we can only ensure that er,; G \/Q, thus we can represent them exactly 
by their square. We also assume that this version has access to a perfect 
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primitive (or an oracle) SampleZ 00 ((Tj,ti,r = oo) that given V°f £ Q 
answers an integer x : Z exactly according to the distribution Dj i rJi ti . It 
does not matter how to sample such a perfect distribution, as the purpose 
of this perfect algorithm is to be a reference for inexact ones. 

— SampleLattice m is our basic floating-point version, using FP m . The matri- 
ces n and B* and values cr t may have been pre-computed exactly, but only 
approximations are stored. 


Algorithm 1. SampleZ m : Rejection Sampling for Discrete Gaussian on Z 
input: A center t : FP m , and a parameter a : FP m , and a tailcut parameter t : FP m 
output: output x : Z, with distribution statistically close to 


1: hi 7T /a 3 : FP m ; r ma x •<-[< + 

2: x •(— Randlnt(a' min , .x max ) : Z; 

3 : r i— RandFloat m () : FP m ; 

4: Goto Step 2. 

r«rl : Z ; a: m i n ^\t-Tcr\:X 
p <— exp (h ■ ( x — t) 2 ) : FP m 
if r < p then return x 


Algorithm 2. SampleLattice m : 

Gaussian Sampling over a lattice 


input: a (short) lattice basis B = (bi, . . . ,b„) : Z nXn , parameter a : FP m , A target 
vector c : Z lxn , and a tailcut parameter r : FP m Precomputation: The GS 
decomposition ( B * = (bj, . . . , b* ), (pn,j) = (p l5 . . . , /*„)), norms n m ||b*|| : FP m 
and <Ji = a /n : FP m 

output: a vector v : Z lxn drawn approximately from Dl, c ,ct where L = L(B) 

1: v, z <r- 0 : Z" ; t •<— DecomposeGS m (c, B*) : FP m 

2: for i = n downto 1 do 

3: Zi <— SampleZ m (cTj, L, r) : Z 

4: v 4- v + Zi ■ bi : Z n ; t «- t - Zi ■ Mi : 

5: end for 
6: return v 




Algorithm 3. DecomposeGS m : Decompose a vector c o 

ver the GS Basis 

input: A vector c : Z lx ”, an orthogonal basis B* = (bf, . 

||b*|| 2 € FP m 

output: output t : Q" such that c = fibj + • • • + 1„ b* 

1: y <— c • .B* 4 : Z lxn 

2: return (yi/rf, y„/r 2 ) 

. . , b* ) : Q" x ”, and r 2 = 


The description of SampleLattice^ differs from the original description 
jl oil 2| only in the way we compute and update the coordinates Vs. In our 
version, the final value of t* before it is used is f; = (c, b*) /rf — 
which matches with the original value : 


t = ( c - z i h i > b * ) A? = <c, b*) - Y, V 


!r\ = U 
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We unroll this computation and update the sum after each value Zi is known. 
This allows a parallelization up to n processors without the usual logn factor 
required for summing up all terms. 

Since we use the matrix fj, in the main loop, we might want to get rid of B* 
for the DecomposeGS algorithm, to save some precomputation and storage, 
by computing c' c ■ B t and then solving the triangular system y // = c'. Solv- 
ing this system also requires n 2 operations, however when using FPA, it would 
produce a relative error exponential in the dimension n, because we recursively 
use previous results. 

Our main loop may also be seen as solving a triangular system, where we 
apply Gaussian rounding at each step. It is worth noting that this additional 
rounding prevents such relative exponential error, as our proof will show. 

Efficiency of SampleLattice^ . The algorithm SampleLattice^ performs 
0(n 2 ) arithmetic operations on rational numbers of size 0(n log B), which leads 
to a complexity of 0(n 4 ) for cryptographic use. Here, we ignored the calls to the 
oracle SampleZ oc (-, -, r = oo). 

Termination of SampleZoo (• , • , r < oo) . We upper bound the number of trials of 
Rejection Sampling, ignoring issues related to the transcendental function exp: 

Fact 2. If a > 4 and r > 1, and uniforms x <— Z n [x m i n , a: max ] and r <— [0, 1), 
we have Pr [r < /v,t(*)] > l/(6r) where x m i n = \t — ra] and x max = [t + raj . 

Thus SampleZoo (•, - ,t) performs less than 6 r trials on average. 


3.2 Correctness 

We give the list of assumptions needed for our correctness results (Theorems 01 
and Oj), and which we refer to as conditions A. 

Assumption on Gram-Schmidt Precomputation. We assume that the 
Gram-Schmidt values are (possibly approximately) precomputed, and that 
the computed values p,ij, b*j and ft* verify: 

\AfMj\ = \m,j ~ fM,j\ < \ Ah h\ = \K j ~ Kj\ - ll b i II e > 

\A*4 = ki - *4 < 


where /t denotes the maximal absolute value of the sub-diagonal coefficient of 
ji. Those condition can be achieved by running the precomputation exactly, 
then convert the result to floating points of mantissa size m. 

Assumption on the Target Vector. We assume that the components c» of 
the input target vector c satisfy: |c*| < q for a parameter q. This holds in all 
known cryptographic applications of lattice sampling, for which the lattice 
is q- ary. But we do not require that the lattice is g-ary. 
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Assumption on the Parameters 

{ e < 0.01, K n = (1 + e) n < 1.1, 1 + nK n e < 1.01 
m < 0.01, V*, Gi > ?? t (Z), Vi, crj > 4 
n > 10 r > 4 

The assumptions on e are easily achievable for a mantissa size m at least 
logarithmic in the dimension n. The condition on i is not restrictive as it 
needs to be negligible. Similarly, conditions on crj’ s are not restrictive since 
the security requires all crj > r/,.(Z) > 4 for security parameters A > 80. 

For the rest of the analysis, we assume that all parameters B, c and a are fixed. 
Our main result states that with enough precision, the outputs of the exact 
sampler SampleLattice^ and the floating-point sampler SampleLattice m are 
statistically close: 

Theorem 3. There exist constants C \ , C T , C m , such that for any security pa- 
rameter A > G\, and under conditions A, the statistical distance between 
SampleLattice m and SampleLattice^ is less than 2 -A on the same input 
if the following conditions are satisfied: 

t > C T \j A logn to > C m + A + 21og 2 (||B _1 || 5 ) + log 2 (g?n 4 (q + ct 2 )t 3 ) 

Furthermore, under those conditions, the integers manipulated by 
SampleLattice m can be represented by floating-point numbers without errors. 

3.3 Efficiency 

We deduce the efficiency of the basic floating-point sampler from Theorem 0 
We first analyze SampleZ m : 

Fact 4. There is a constant C m such that for any to > C m , and any r > 1, 
SampleZ m (• , • , r) performs less than 6 r trials on the average. 

This can be easily derived from Fact |2I and appropriate error bound (see full 
version). This ensures that SampleLattice m performs ~ 6n 2 FP m -operations 
as long as r = o(n). 

Arbitrary bases. To minimize the FPA-precision to in Theorem E3 we need to 
evaluate log( || -B - 1 1| s ) : this is always less than « nlog(S) by Cramer’s rule. This 
leads to the constraint to > A + ni where i is logarithmic in n and B, yielding 
a 0(n 3 ) bit-complexity as long as A = 0(n), or 0(n 4 ) without fast integer 
arithmetic. 

The exact algorithm SampleLattice^ also has complexity <D(n 3 ). However, 
the constants are likely to be smaller for the FPA sampler. Indeed, the exact al- 
gorithm must handle integers of size log(maxx^.^„ vol(bi, . . . , bj)), whereas the 
quantity log(||H _1 || s ) is typically smaller, though they have similar worst-case 
asymptotical bounds. And the constants of the FPA sampler can be improved 
by processing the basis, for instance using LLL reduction. 

Furthermore, in cryptographic applications, we may focus on bases B of a 
particular shape. More precisely, we will consider the following type of basis: 
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Small-inverse bases. A sequence C = (C n ) of square matrices generating q n - ary 
lattices of dimension n is a class of small-inverse bases if there exists a polynomial 
function / such that for any basis B e C n , ||.B|| S < f(n) and ||-B _1 || s < /(n). 

In particular, the bases used by the NTRUSign signature scheme m form 
a small-inverse class (see US). For such bases, we only need m > A + £ for l 
logarithmic in A. This still gives a 0(n 3 ) complexity for cryptographic use (when 
A ~ n), but with much better constants. 

4 A Lazy Floating-Point Variant of Klein’s Algorithm 

Overview. We now describe our optimized sampler, which is more efficient than 
the basic sampler, due to a better use of FPA. The analysis of the basic sampler 
showed that it was sufficient to compute U up to « A bits below the unity to get 
an error below 2~ x on the output distribution. However, a careful analysis of 
the rejection sampling algorithm (Alg. QJ shows that most of the time, many of 
those bits are not used: the precision of U impacts the precision of p = p a . t (x), 
which is only used to make a comparison with a uniform random real r€ [0, 1). 
For all j > 1, such a comparison is determined by the first j bits, except with 
probability 2~ j (exactly when the j first bits of r and p match); and on average 
only the first two bits contribute to the decision. 

However, we still need to decide properly this comparison even when the first 
j < A bits match, to output a proper distribution. This suggests a new strategy: 
compute lazily the bits of t* and p. We first only compute most significant bits 
and backtrack for additional bits until the comparison can be determined. We 
choose a simple lazyness control, using only two levels of precision (for simplicity, 
but also for practical efficiency). Informally, we choose k < A, and compute t % up 
to a precision m! that only guarantees the first k bits of p. draw the first k bits 
of the random real r. If the comparison is decided with those k bits, continue 
normally. Otherwise (which happens with probability less than 2~ k ), recompute 
U and p at a precision m to ensure A correct bits. 

4.1 Description 

Our optimized sampler LazySampleLattice TO , m (Alg [TJ) works with two 
floating-point types, FP m (high precision) and FP m / (low precision), where 
ra > m! . The algorithm works similarly to the original one, except it now works 
most of the time at low precision m! . The subroutine for sampling over Z is re- 
placed by LazySampleZ,,,/ ,,, , which takes the usual arguments at low precision, 
plus an error bound, and access to high-precision arguments: a is precomputed 
thus requiring no special care, however, the access to high precision value of t is 
given through a function that takes no argument. 

This new subroutine LazySampleZ m / m (Alg. Ej) works identically to the 
original SampleZ m / as long as the decisive comparison is trusted, i.e. as long as 
the difference \r' — p'\ is higher than the error bound 5 p . Otherwise, the high pre- 
cision is triggered, and high-precision inputs are requested through the function 
F. Then all sample trials are computed with high precision. 
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Algorithm 4. LazySampleLattice m / m : Lazy Gaussian Sampling over a lattice 
input: Same as SampleLattice plus low precision versions of p, B* and oys values: 

p',B*' : FP”*”, a[ : FP m ,, and an error bound 5 V 
output: Same as SampleLattice 
1: v,z<— 0 : Z n ; t' DecomposeGS ra i (c, B*') : FP^, 

2: for i = n downto 1 do 

3: Ft F- {() ^ (c, b*> - <z, [p 4 ] .)}: (void -> FP m ) 

4: zt <— LazySampleZ m / jm (cr(, r, 5 P , at, Ft) : Z 

5: v v + z, • b; : Z n ; t' -f— t' — z* • p( : FP", 

6: end for 


7: return v 




Algorithm 5. LazySampleZ m / 

m (cr', t, t’, Sp : FP m / , o : FP m , F : (void -y FP m )) 


1: h' < 7r/cr 2 : FP m /; a: ma x \t' + r</] : Z ; x m i n <— \t' — Tcr'J : Z ; highprec 


false : bool 

2: x <— Raudlnt(a- m i n . x rnax ) : Z ; r' RandFloat m / () : FP m / 

3: if not (highprec) then 

4: p' exp(h' • (a: — t') 2 ) : FP m / 

5: if | r' - p'| < then {i F() : FP m ; h < n/a 2 : FP m ; highprec <- true } 

6: else if r' < p then return x 

7: end if 

8: if highprec then 

9: r <- ExtRandFloat m / ?m (r') : FP m ; p <- exp (h • ( x — t) 2 ) : FP m 

10: if r < p then return x 

11: end if 
12: Goto Step 2. 


4.2 Correctness 

We need to determine a proper value for the error bound S p in terms of the 
basis and m! (the size of the low precision), to ensure correctness. For this 
parameter, the lower the better, since it determines the probability to trigger 
the re-computation of t at high precision, as detailed in the next section. The 
behavior of the new subroutine is analyzed by the following: 

Lemma 1 (Informal, see 0 for a formal statement). The behaviour of 
LazySampl eZ m m / given approximate inputs a±6 a andt±6t andd p , is similar 
to SampleZ m on input a, t under the condition: 

S p > 4 erV + 1.7cr S a + (1.7/a)5 t where e' = 2 1-m 

From this lemma, we prove the correctness of LazySampleLattice m , m , sum- 
marized by the following result. 

Theorem 5. There exist constants C\, C T . C m , C' m , Cg p , such that for any secu- 
rity parameter A >C\, and under Conditions A, the statistical distance between 
LazySampleLattice m TO / and SampleLattice^ is less than 2~ x on the same 
input if the following conditions are satisfied: 
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t > C T v A log n 

m > C m + A + 2 log 2 (||-B — 1 1| s ) + log 2 (fi 2 n 4 qa 2 r 3 ) 
ml > C m ' +21og 2 (||i? _1 || s ) +log 2 (/i 2 n 4 (Q + cr 2 )r 3 ) 

6 P > 2~ k where k = rrl - ( Cg p + 2 log 2 ( 1 1 -B - 1 1 1 s ) + lo S 2 (p 2 n 3 Ti 


2 </)) 


Furthermore, under those conditions, the integers manipulated by the algorithm 
can be represented by low-precision floating-point numbers (FF m > ) without errors. 


4.3 Efficiency 

The error bound 5 P impacts the efficiency of the optimized sampler as follows: 
Lemma 2. Under the conditions of Theorem 0 each call to LazySampleZ m>m / 
triggers high precision with probability less than 12 tS p . On the average, the al- 
gorithm LazySampleLattice m m , performs less than 0(ti 2 tS p ) high-precision 
floating-point operations. 

Proof. At each trial performed by L az y S a m p 1 e Z mm / , the probability to trigger 
high precision is less than 2 5 p : indeed it happens only if the randomness r' <— 
[0, 1) falls in the interval \p’ — 5 v ,p' +8 P ]. It remains to bound the average number 
of trials performed by LazySampleZ mm / . The condition of Theorem 0 ensures 
that it behaves similarly to SampleZ m . Thus, for a large enough m, Fact 0| 
ensures that the average number of trials is less than 6 r. 

Triggering high precision during LazySampleZ mm / requires O(n) high- 
precision FPA operations. This subroutine is called n times, thus on the av- 
erage less than 0(n 2 Td p ) high-precision FPA operations. □ 

This leads to our main result: with Small-Inverse bases, the discrete Gaussian 
distribution can be sampled in quasi-quadratic time, with an exponentially small 
statistical distance, and no sacrifice on the quality compared to the analysis 

ofca. 

Theorem 6 (Gaussian sampling in quasi-quadratic time). Let (C n ) be a 
Small-Inverse class of bases. For any implicit function X, such that A ~ n, and 
a polynomial in n, there exist implicit functions m,rrl ,t,S p of n such that, for 
any basis B &C n generating a lattice L: 

— LazySarnpleLattice m m , (B, a, c, r, 6 P ) runs in expected time £>(n 2 ) without 
fast integer arithmetic. 

— LazySampleLattice m m >(B, a, c, r, 6 P )) < 2 -A whenever a veri- 
fies a > ||B*|| ? 7 t (Z) with t = 2 -A /'(4n). 

Proof. For a small-inverse class of bases, the conditions of Theorem 0 can be 
satisfied with functions verifying: 

t = 0(^/n), m = 0(n), m' = O(logn), 8 P = 0(l/n 5 l 2 ). 

Lemma |2| states that on the average, less than 0(n 2 r5 p ) high-precision opera- 
tions are performed, which in our case is a 0(1). Without fast integer arithmetic, 
the total complexity is thus less than 0(n 2 )0(m' 2 ) + 0(l)0(m 2 ) < 0(n 2 ). □ 
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5 Speeding Up Peikert’s Offline Algorithm 

Peikert [23 recently proposed a different sampling algorithm based on convo- 
lution, which was inspired by NTRUSign’s perturbation countermeasure na- 
This algorithm offers a different trade-off than Klein’s algorithm, with slightly 
worse constraints on sampling parameters (see m for details). The discrete 
Gaussian distribution is obtained by adding two points, one generated by an 
offline phase, the other generated by a (cheaper) online phase. The online phase 
is essentially a randomized variant of Babai’s round-off algorithm |5j, which 
only involves small-integer arithmetic when the input is a q - ary lattice, and 
thus runs in <D(n 2 ) time, and even 0(n) in ring settings. This offline phase is 
itself essentially the generation of some discrete Gaussian distribution, which 
requires long-integer arithmetic, and is not fully analyzed in 123, but seems to 
be 0(n 3 ) (even 0(n A ) without fast integer arithmetic) like Klein’s algorithm. In 
the follow-up work of Micciancio and Peikert 123 , a new kind of lattice trapdoor 
is introduced to optimize efficiency and geometric quality, which allows an even 
faster online phase, but the same kind of offline computations is required. We 
refer to this common offline phase as Peikert’s offline algorithm. 


5.1 Peikert’s Offline Algorithm 

Let B be the input basis of the lattice for which one wants to generate the discrete 
Gaussian distribution. In both |2fil2()| , the offline phase consists of generating a 
(centered) discrete Gaussian noise over Z n of parameter E e <S+ such that B*B+ 
E = sl n where s is some appropriate real number: this implies certain constraints 
on B which are discussed in (23- Letting E = 0*0, this distribution ^ 
has support Z" and density at x proportional to pi i o(xC' _1 ): in other words, 
this is “essentially’ the discrete Gaussian distribution V over the lattice spanned 
by C -1 , since the density of x 6 Z n is proportional to the density of the lattice 
point xC -1 in T>. The offline-phase algorithm is described in Alg. |3 (from [23 ) : 
it generates this discrete Gaussian distribution by convolution (see [ 23 ), which 
is a different strategy than Klein’s algorithm, and has different constraints. The 
main idea is to consider a “shift” E' = E — r)' 2 I n of E such that E' e <S+ (which 
implies that E > rj 2 I n ) and rj > r/,.(Z”), and to compute a square-root L of 
E', i.e. E' = L f L. To implement this, it is suggested in [23 to use a Cholesky 
decomposition. The parameters selected to reach security A are rj = t = r?,,(Z) 


Algorithm 6. Peikert’s Offline Algorithm 

input: E 6 5+, a real q > J? t (Z n ) such that E' = E — q 2 I n € <S+ and u is negligible, 
and a square-root L of E' i.e. E' = L* L. 
output: An integer vector z£Z" following the distribution D z „ ^ 

1: Choose x : R" according to the continuous Gaussian distribution of covariance I„ 
2: y = x • L 

3: for i = 1 to n do Zj f- SampleZ m (? 7 , t/j, r) 

4: return z 
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= 0(V A). The choice of the floating-point precision is not discussed in [2fil20j . 
however a quick analysis shows that one should take m = A + 1 where l is 
logarithmic in n, s and r. Thus, a naive implementation would have a running- 
time of 0(n 2 A 2 ), the main cost being a (non-structured) matrix- vector product: 
that is n 2 floating-point operations, at precision 0( A). 


5.2 Using Laziness in Peikert’s Offline Algorithm 

Like in Klein’s sampling algorithm, the offline phase of Peikert’s algorithm m 
only uses non-integer values to compute the input of the SampleZ m (r?, -, t) 
subroutine. High-precision bits of this input are useless except with small prob- 
ability: one may apply the laziness technique to improve efficiency to 0(n 2 ), by 
replacing the subroutine by Lazy SarnpleZ,,,/ , m . We sketch a proof. 

The floating-point computation yj = Y17= 1 x iLj,i with m bits of precision 
produces an error less than 0(n 2 Hx^ || £'|| 0O e) where e = 2 1-m . For r = 0(y/n) 
we have that |&j < r with overwhelming probability, and ||L|| < ||L|| S < s 

since L'L = C' < cr 2 Id. The error propagation is thus polynomial in n, and 
Lemma |T| ensures correction with the following parameters: 

r = OWn), m = 0(n), rri = O(logn), <5 P = G(l/n 5 / 2 ). 

Similarly to Lemma El one easily proves that, on average, less than 0(ti 2 tS p ) 
high-precision operations are performed, which in our case is 0(1). Without 
fast integer arithmetic, the total complexity is thus less than 0(n 2 )0(m' 2 ) + 
0(l)O(m 2 ) < 0(n 2 ). 

6 Quasi-Linear Complexity in Ring Settings 

n = Z q [X\/(X b ±l) 

For efficiency purposes, lattice cryptography often uses a special class of “alge- 
braic” lattices arising from polynomial rings i.e. 1Z = T,[X]/ (P{X)) for some 
polynomial P of degree b. More precisely, the lattices are generated by an 1Z- 
basis, and can also be viewed as an integer lattice of dimension £b for some 
l> 1. 

In this section, we show that for the ring settings 1Z = 'L q [X\/(X h ± 1), it is 
possible to achieve quasi-linear complexity using two improvement on top of our 
lazy variant of Peikert’s offline phase |2(il2()| . The first improvement is to use 
special square-root algorithms ( e.g . Babylonian Method or the Denman-Beavers 
iteration 0) to preserve matrix structures, unlike Cholesky decomposition. In 
our case, we use block-circulant or block-skew-circulant structures, which are 
stable under transposition and multiplication, which implies that S' = X — 
r) 2 I n = (s — rj 2 )I n — B*B has the same structure. The second improvement 
targets SampleZ. 
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6.1 Structured Square- Root for TL = h q [X]/(X b ± 1) 


Consider the special ring setting 1Z = r L q \X\j (X h ± 1), which includes 
Z q [X]/(X b - 1) for the class of NTRU lattices |E|, and some cyclotomic lattices 
Z q [X\/ (<P m ) the m-th cyclotomic ring, when to is a power of two, made popular 
by the hardness results of m- 

When P(X) = X b — 1 (resp. P(X) = X b + 1) the integer representation B g 
M bky 'bi (1,) of any 7£-basis is a 6-block circulant, (resp. 6-block skew-circulant) 
matrix, i.e. a matrix composed with (6 X 6)-blocks of the form : 


Ol 02 ■ • • Clb 
a b ai ■■■ a b -i 


We denote these families by C b (resp. Cj ). These families are stable under ring 
operations (addition, product and inverse, when defined) because of the ring iso- 
morphism with matrices over 1Z. Such isomorphisms also exist for other polyno- 
mials P, defining other 6-block structures. However, circulant and skew-circulant 
structures have a key property for our improvement: 

Fact 7. Matrix families C b and Cj are stable under transposition. 

From this, we deduce that X’ = X ~ rf I n = (s — r] 2 )I n — B t B g C b (or Cj ) when 
working in this ring setting. At this point, one would want to find a square root 
of E that is still structured. Interestingly, the solution can be found in algorithms 
that were designed to extract another notion of square root; namely, the Baby- 
lonian Method, or the Denman-Beavers iteration j^|. Indeed, those algorithms 
are searching for an Y such that Y ■ Y = X, without symmetry requirement 
on X, and no guarentee of convergence in general. Lemma 0 proves that given 
as input X G <S+, such methods (quickly) converge to some Y g S.f such that 
Y* ■ Y = X. 


Definition. The Babylonian Method approximates the limit of the sequence: 

Y 0 (X) = I n ; Y k+1 (X) = (Y fc pO + X • Y k {X)~ 1 )/ 2 (2) 

and if this sequence converges to an invertible limit Y (A - ), it must verify Y (A - ) = 
| (F(W) + AT ■ r(AT)- 1 ), which is equivalent to F(A) • Y(X) = X. The Denman- 
Beavers iteration is similar, using the sequences: 

(Y 0 (X) = X (Y k+1 (X) = (Y k (X) + Z k ( A)” 1 ) /2 
\Z 0 (A)=Id \Z k+1 (X) = {z k (X)+Y k (X)- 1 )/2 

it verifies the invariant Y k ■ Zff 1 = Zjf 1 -Y k = X, and if it converges, the limit Y 
of Y k verifies Y Y = X. 
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Lemma 3. Let X £ Sf be a symmetric positive definite matrix, then the Baby- 
lonian Method, as defined by the sequence Yk(X) in converges quadraticallxp 
to some Y(X) e <S+. Furthermore, if X e Cb (resp. Cj ) then Y(X) 6 Cb 
(resp. Cj), which implies that Y{X) t Y{X) = X. Similar results also hold for 
the Denman-Beavers iteration W- 


Proof (sketch). By induction, write Yi(X) as QDjQ* for a fixed orthogonal ma- 
trix Q and diagonal matrices Dj. Each diagonal entry of (Df) follows the Babylo- 
nian Square-Root sequence over R, which allows to prove convergence. Structure 
preservation follows from ring and topological closure of Cb and Cj . 


6.2 Improved Efficiency 

Assuming the square root L of E was precomputed using one of the structure- 
preserving algorithms described below, each computation of y = x-L at precision 
m' can now be done in time 0(nm' 2 ), but some coordinate may need to be 
recomputed at precision m. Using a similar analysis as in Sect. 15.21 with: 

r = <D{Jn),m = 0(n),m' = 0(logn),S p = 0(l/n 7 * * * * ' 2 ). 

we show that the “average” time 3 4 * * spent on the computation of y = x ■ L is 
indeed 0(n). 

By combining Laziness and Structured-Square-Root, we move the complexity 
bottleneck to the LazySampleZ subroutine, which is called n times and requires 
0(t) = 0(V A) trials in average. For A ~ n, this leads to an overall average 
complexity of 0(n 15 ). 

To reach quasi-linear complexity we need a third trick, detailed in the full 
version j^j. There, we improve the rejection sampling algorithm SampleZ so 
that it only needs a constant number of trials on average. This is done by sam- 
pling from a distribution before rejection which is much closer to the target 
distribution than the uniform distribution used in SampleZ. 

By combining the three techniques, we eventually obtain an implementation 
of Peikert’s offline phase which runs in average 4 quasi-linear time. These results 
also apply to the recent variant of Micciancio and Peikert EDI- 


3 The number of correct bits grows quadratically with the number k of iterations: 
|sfc — Soo| < c 2 -c k for some c, c' > 0. 

4 We explain what we mean by average. As high-precision is triggered independently 

with small probability over n trials, the running times of the optimized Klein’s Sam- 

pler and optimized Peikert’s Offline Phase are bounded by some function 0(n 2 ), ex- 

cept with negligible probability. However, when applying laziness in the ring setting, 

triggering high-precision once in the whole algorithm raises this instance’s running 

time to 0(nX 2 ): only the average cost is below that bound. And dealing with average 

running times is less problematic in an offline phase, than in an online phase which 

is more subject to timing attacks. 
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Abstract. NTRUSign is the most practical lattice signature scheme. 
Its basic version was broken by Nguyen and Regev in 2006: one can 
efficiently recover the secret key from about 400 signatures. However, 
countermeasures have been proposed to repair the scheme, such as the 
perturbation used in NTRUSign standardization proposals, and the de- 
formation proposed by Hu et al. at IEEE Trans. Inform. Theory in 2008. 
These two countermeasures were claimed to prevent the NR attack. Sur- 
prisingly, we show that these two claims are incorrect by revisiting the 
NR gradient-descent attack: the attack is more powerful than previ- 
ously expected, and actually breaks both countermeasures in practice, 
e.g. 8,000 signatures suffice to break NTRUSign- 251 with one pertur- 
bation as submitted to IEEE P1363 in 2003. More precisely, we explain 
why the Nguyen-Regev algorithm for learning a parallelepiped is heuristi- 
cally able to learn more complex objects, such as zonotopes and deformed 
parallelepipeds. 


1 Introduction 

There is growing interest in cryptography based on hard lattice problems (see 
the survey W)- The field started with the seminal work of Ajtai | 2 | back in 
1996, and recently got a second wind with Gentry’s breakthrough work [Jj on 
fully-homomorphic encryption. It offers asymptotical efficiency, potential resis- 
tance to quantum computers and new functionalities. There has been significant 
progress in provably-secure lattice cryptography in the past few years, but from 
a practical point of view, very few lattice schemes can compete with standard- 
ized schemes for now. This is especially true in the case of signature schemes, for 
which there is arguably only one realistic lattice alternative: NTRUSign Cl. 
which is an optimized instantiation of the Goldreich-Goldwasser-Halevi (GGH) 
signature scheme jHj using the compact lattices introduced in NTRU encryp- 
tion d and whose performances are comparable with ECDSA. By comparison, 
signatures have size beyond 10,000 bits (at 80-bit security level) for the most effi- 
cient provably-secure lattice signature scheme known, namely the recent scheme 
of Lyubashevsky [HI • 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 433-gSU] 2012. 

(c) International Association for Cryptologic Research 2012 
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However, NTRUSign has no provable-security guarantee. In fact, the GGH 
signature scheme and its simplest NTRUSign instantiation were broken at EU- 
ROCRYPT ’06 by Nguyen and Regev [22! ■ who presented a polynomial-time 
key-recovery attack using a polynomial number of signatures: in the case of 
NTRUSign, 400 signatures suffice in practice to disclose the secret key within 
a few horns. In the GGH design, a signature is a lattice point which is rel- 
atively close to the (hashed) message. Clearly, many lattice points could be 
valid signatures, but GGH selects one which is closely related to the secret key: 
each message-signature pair actually discloses a sample almost uniformly dis- 
tributed in a secret high-dimensional parallelepiped. The NR attack works by 
learning such a parallelepiped: given a polynomial number of samples of the form 
Y^i= i 2 -'j;bj where the x t 's are picked uniformly at random from [—1,1] and the 
secret vectors bi, . . . , b n Si" are linearly independent, the attack recovers the 
parallelepiped basis (bi,...,b n ), by finding minima of a certain multivariate 
function, thanks to a well-chosen gradient descent. The NR attack motivated 
the search of countermeasures to repair NTRUSign: 

— The very first countermeasure already appeared in half of the parameter 
choices of NTRU’s IEEE P1363.1 standardization proposal [Ej, the other 
half being broken by NR. It consists of applying the signature generation 
process twice, using two different NTRU lattices, the first one being kept 
secret: here, the secret parallelepiped becomes the Minkowski sum of two 
secret parallelepipeds, which is a special case of zonotopes. This slows down 
signature generation, and forces to increase parameters because the signature 
obtained is less close to the message. However, no provable security guarantee 
was known or even expected. In fact, heuristic attacks have been claimed 
by both the designers of NTRUSign [TB| and more recently by Malkin et 
al. m, but both are impractical: the most optimistic estimates 1 1 1 1121 )j state 
that they both require at least 2 60 signatures, and none have been fully 
implemented. Yet, as a safety precaution, the designers of NTRUSign urn 
only claim the security of NTRUSign with perturbation up to 1 million 
signatures in El- Still, breaking this countermeasure was left as an open 
problem in E3 

— In 2008, Hu, Wang and He [IB! proposed a simpler and faster countermeasure 
in IEEE Trans. Inform. Theory, which we call IEEE-IT, where the secret 
parallelepiped is deformed. Again, the actual security was unknown. 

— Gentry, Peikert and Vaikuntanathan |B| proposed the first provably secure 
countermeasure for GGH signatures, by using a randomized variant of 
Babai’s nearest plane algorithm. However, this slows down signature gener- 
ation significantly, and forces to increase parameters because the signatures 
obtained are much less close to the message. As a result, the resulting sig- 
nature for NTRUSign does not seem competitive with classical signatures: 
no concrete parameter choice has been proposed. 

Our Results. We revisit the Nguyen-Regev gradient-descent attack to show 
that it is much more powerful than previously expected: in particular, an opti- 
mized NR attack can surprisingly break in practice both NTRU’s perturbation 
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technique UH as recommended in standardization proposals j J l 711 3j . and the 
IEEE-IT countermeasure [THl . For instance, we can recover the NTRUSign se- 
cret key in a few hours, using 8,000 signatures for the original NTRUSign- 251 
scheme with one perturbation submitted to IEEE P1363 standardization in 2003, 
or only 5,000 signatures for the latest 80-bit-security parameter set [EH proposed 
in 2010. These are the first successful experiments fully breaking NTRUSign 
with countermeasures. Note that in the perturbation case, we have to slightly 
modify the original NR attack. The warning is clear: our work strongly suggests 
to dismiss all GGH/NTRUSign countermeasures which are not supported by 
some provable security guarantee. 

Our work sheds new light on the NR attack. The original analysis of Nguyen 
and Regev does not apply to any of the two NTRUSign countermeasures, and 
it seemed a priori that the NR attack would not work in these cases. We show 
that the NR attack is much more robust than anticipated, by extending the 
original analysis of the Nguyen-Regev algorithm for learning a parallelepiped, 
to tackle more general objects such as zonotopes (to break the NTRUSign 
countermeasure with a constant number of perturbations) or deformed paral- 
lelepipeds (to break the IEEE-IT countermeasure). For instance, in the zonotope 
case, the parallelepiped distribution x &i is replaced by r,y, where 
vi , . . . , v TO e R n are secret vectors with m> n. The key point of the NR attack 
is that all the local minima of a certain multivariate function are connected to 
the directions b, ; ’s of the secret parallelepiped. We show that there is somewhat 
a similar (albeit more complex) phenomenon when the parallelepiped is replaced 
by zonotopes or deformed parallelepipeds: there, we establish the existence of 
local minima connected to the secret vectors spanning the object, but we can- 
not rule out the existence of other minima. Yet, the attack works very well in 
practice, as if there were no other minima. 

Roadmap. In Sect. El we recall background on NTRUSign and the NR attack. 
In Sect. 0 we attack NTRU’s perturbation countermeasure, by learning a zono- 
tope. In Sect. El we attack the IEEE-IT countermeasure, by learning a deformed 
parallelepiped. More information is provided in the full version (51 . 

2 Background and Notation 

2.1 Notation 

Sets. Z q is the ring of integers modulo q. N and Z denote the usual sets, [n] 
denotes {1, • - - , n} . §„ is the unit sphere of R n for the Euclidean norm ||.||, 
whose inner product is (,). 

Linear Algebra. Vectors of R" will be row vectors denoted by bold lowercase 
letters. A (row) matrix is denoted by [bi, . . . , b n ]. We denote by M. rn , n (TZ) the 
set ofmxn matrices over a ring 1Z. The group ofnxn invertible matrices with 
real coefficients will be denoted by SC n (M) and (9„(R) will denote the subgroup 
of orthogonal matrices. The transpose of a matrix M will be denoted by M*. 
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and M -t will mean the inverse of the transpose. For a set S of vectors in R n 
and M £ S ■ M denotes the set {s • M : s £ <S}. We denote by I n the 

n X n identity matrix. 

Rounding. We denote by [a:J the closest integer to x. Naturally, |~bj denotes the 
operation applied to all the coordinates of b. 

Distributions. If X is a random variable, we denote by E[X] its expectation. For 
any set S, we denote by U (S) the uniform distribution over S, when applicable. 
If D is a distribution over R", its covariance is the n x n symmetric positive 
matrix Cov(X>) = E x< _x> [x*x] . The notation D CD D' denotes the convolution of 
two distributions, that is the distribution of x + y where x <— D and y 4 — D' 
are sampled independently. Furthermore, we denote by D ■ B the distribution of 
xB where x<—V. 

Zonotopes and Parallelepipeds. A zonotope is the Minkowski sum of finitely 
many segments. Here, we use centered zonotopes: the zonotope spanned by an 
mxn row matrix V — [vi, . . . , v m ] is the set Z(V) = {YliLi x i v i, — 1 < < 1}. 

We denote by D Z (v) the convolution distribution over Z(V) obtained by pick- 
ing independently each Xi uniformly at random from [—1,1]”: in other words, 
Dz(V) = W([— l,l] n ) • V, which in general is not the uniform distribution over 
Z(V). However, in the particular case V £ SC„(R), Z(V) is simply the paral- 
lelepiped V(V) spanned by V. and D-p(v) is equal to the uniform distribution 
over V(V). 

Differentials. Let / be a function from R" to R. The gradient of / at w £ R" is 
denoted by V/( w) = (J^-(w), . . . , J^-(w)). The Hessian matrix of / at w £ R n 
is denoted by H/(w) = ( w ))i<i,j<n- 

Running Times. All given running times were measured using a 2.27-GHz Intel 
Xeon E5520 core. 

Lattices. We refer to the survey j23| for a bibliography on lattices. In this paper, 
by the term lattice, we mean a full-rank discrete subgroup of R n . A non-empty 
set L C R n is a lattice if and only if there exists B = [bi , . . . , b n ] £ QC n (R) such 
that L = n iAh | n t £ Z} . Any such B is called a basis of L. and the 

absolute value of its determinant is the lattice volume vol(L) of the lattice L. 
The closest vector problem (CVP) is the following: given a basis of L C Z" and 
a target t £ Q n , find a lattice vector v £ L minimizing the distance ||v — t||. If 
d is the minimal distance, then approximating CVP to a factor k means finding 
v £ L such that ||v — t|| < kd. Bounded Distance Decoding (BDD) is a special 
case of CVP where the distance to the lattice is known to be small. 

2.2 The GGH Signature Scheme 

The GGH scheme 0 works with a lattice L in Z". The secret key is a non- 
singular matrix R £ A4„(Z), with very short row vectors. Following |2T], the 
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public key is the Hermite normal form (HNF) of L. The messages are hashed 
onto a “large enough” subset of Z n , for instance a large hypercube. Let m e Z" 
be the hash of the message to be signed. The signer applies Babai’s round-off 
CVP approximation algorithm j3] to get a lattice vector close to m: 

(1) 

so that s — m e \V{R). To verify the signature s of m, one checks that s £ L 
using the public basis B, and that the distance ||s — m|| is sufficiently small. 


2.3 NTRUSign 


Basic Scheme. NTRUSign HU is an instantiation of GGH using the compact 
lattices from NTRU encryption jT2| , which we briefly recall: we refer to jl I I4j for 
more details. In the former NTRU standards 0 proposed to IEEE P1363.1 fT7| . 
N = 251 and q = 128. Let 1Z be the ring Z[X\/ (X N — 1) whose multiplication is 
denoted by *. One computes (/, g, F, G) £ 1Z 4 such that f * G — g * F = q in TZ 
and / is invertible mod q, where / and g have 0—1 coefficients (with a prescribed 
number of 1), while F and G have slightly larger coefficients, yet much smaller 
than q. This quadruplet is the NTRU secret key. Then the secret basis is the 
following (2 N) X (2 IV) block-wise circulant matrix: 

R = c(G ) ] where c ( a ) denotes : : > 


and fi denotes the coefficient of X 1 of the polynomial /. Thus, the lattice di- 
mension is n = 2N. Due to the special structure of R. a single row of R is 
sufficient to recover the whole secret key. Because / is chosen invertible mod q , 
the polynomial h = g/f mod q is well-defined in 1Z: this is the NTRU public 
key. Its fundamental property is that f * h = g mod q in TZ. The polynomial 

UnC(h)] 


h defines the following (natural) public basis of the lattice: 


L 0 qln \ 


which 


implies that the lattice volume is q N . 

The messages are assumed to be hashed in {0, . . . , q — 1} 2JV . Let m be such 
a hash. We write m = (mi, m 2 ) with m; £ {0, . . . , q — 1}^. The signature is 
the vector (s,t) £ Z 2JV which would have been obtained by applying Babai’s 
round-off CVP approximation algorithm to m, except that it is computed more 
efficiently using convolution products and can even be compressed (see [H] ) • We 
described the basic NTRUSign scheme El. as used in half of the parameter 
choices of the former NTRU standards . 


Perturbations. The second half of parameter choices of NTRU standards 0 
use perturbation techniques jl 0i4il 2| to strengthen security, which are described 
in Sect. 12.51 But there is a second change: instead of the standard NTRU secret 
key, one uses the so-called transpose basis, which is simply i?*, then the public 
basis remains the same, except that one defines the public key as h = F/ f = G/g 
mod q rather than h = g/f mod q. 
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New Parameters. In the latest NTRU article H3, new parameters for 
NTRUSign have been proposed. These include different values of (N. q) and 
a different shape for / and g : the coefficients of / and g are now in {0, ±1}, 
rather than {0, 1} iike in |llj . But the scheme itself has not changed. 

2.4 The Nguyen-Regev Attack 

We briefly recall the Nguyen-Regev attack EBl, using a slightly different presen- 
tation. The NR attack soives the following idealized problem: 

Problem 1 (The Hidden Parallelepiped Problem or HPP). Let V = [vi, 
■ • ■ , v„] G SC„(R) and letV(V ) = {^"=i : x i e [ — 1, 1]} be the parallelepiped 

spanned by V. The input to the HPP is a sequence o/poly(n) independent sam- 
ples from the uniform distribution 'D-p(v) ■ The goal is to find a good approxima- 
tion of the rows of MV . 

In practice, instead of samples from V v yy the attack uses 2(s — m) for all given 
message-signature pairs (m, s): this distribution is heuristically close to T>-p(v) 
where R is the secret basis. To recover rows of R, the attack simply rounds 
the approximations found to integer vectors. The NR attack has two stages: 
morphing and minimization. 

Morphing the Parallelepiped into a Hypercube. The first stage of the NR attack 
is to transform the hidden paralieiepiped into a hidden hypercube (see Alg. 0) , 
using a suitable linear transformation L. It is based on the following elementary 
lemma |2B1 Lemmas 1 and 2]: 

Lemma 1. Let V G SC n (R) and denote by G G 6C„(1R) the symmetric positive 
definite matrix V t V . Then: 

- Cov(V r{v) ) = G/ 3. 

-I/lG 6C„(R) satisfies LL* = G _1 and we let C = VL, then C G G„(M) 
and T>p{y) • L = T>-p(c)- 


Algorithm 1. Morphing(A): Morphing a Parallelepiped into a Hybercube 
Input: A set X of vectors x G R n sampled from the uniform distribution T>-p(y ) over 
a parallelepiped. 

Output: A matrix L such that T>-p( V) ■ L is close to T>v(c) for some C G O n (R )• 

1: Compute an approximation G of V f V using the set X , using Cov(P-p(y)) = V t V/3 
(see Lemma 0. 

2: Return L such that LL* = G -1 


This stage is exactly (up to scaling) the classical preprocessing used in inde- 
pendent component analysis to make covariance equal to the identity matrix: 

Lemma 2. Let G be the covariance matrix of a distribution V over 1R". If L G 
6C„(R) satisfies LL* = G -1 , then Co v(D ■ L) = 
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Learning a Hypercube. The second stage of the NR attack is to solve the hidden 
hypercube problem, using minimization with a gradient descent (see Alg. 0). 
Nguyen and Regev m showed that for any V £ O n (ffi) , if V denotes the distri- 
bution D-p{yy. 

— The function momu^fw) = E x< _x>[(x, w) 4 ] has exactly 2 n local minima 
over the unit sphere §„, which are located at ±vi, • • • , ±v n , and are global 
minima. 

— It is possible to find all minima of momp^O over S„ in random polynomial 
time, using Alg. 0 with parameter <5 = 3/4, thanks to the nice shape of 
momx>, 4 (-). Alg. 0is denoted by Descent (AT, w, 5) which, given a point w £ 
§ n , performs a suitable gradient descent using the sample set X, and returns 
an approximation of some ±Vj. 


Algorithm 2. Descent(A, w, S'): Solving the Hidden Hypercube Problem by 
Gradient Descent 

Input: A set X of samples from the distribution T>-p(y) where V £ 0 n (R), a vector w 
chosen uniformly at random from and a descent parameter 6. 

Output: An approximation of some row of ±V. 

1: Compute an approximation g of the gradient V momy 4 (w) using X. 

2: Let \v new = w — <5g. 

3: Divide w new by its Euclidean norm ||w ne tu||. 

4: if momv^Jwne®) > morn^fw) where the moments are approximated using X 
5: return the vector w. 

7: Replace w by v? new and go back to Step 0 

8: end if 


The whole NR attack is summarized by Alg. 0 


Algorithm 3. SolveHPP(A): Learning a Parallelepiped |23j 
Input: A set X of vectors x £ R" sampled from T > V ( V ) , where V £ C/£ n (R) 

Output: An approximation of a random row vector of ±V 
1: L := Morphing(A) using Alg. 0 
2: X:=X L 

3: Pick w uniformly at random from § n 

4: Compute r := Descent(A, w, 5) £ §" using Alg. 0 use S = 3/4 in theory and 
5 = 0.7 in practice. 

5: Return r L -1 


Shrinking the number of NTRUSign -signatures. In practice, the NR attack 
requires a polynomial number of signatures, but it is possible to experimentally 
decrease this amount by a linear factor |2B|, using a well-known symmetry of 
NTRU lattices. We define the NTRUSign symmetry group 6^ TRU as the group 
spanned by a £ 0 n (R) : {xx, . . . x N \y lr ■ ■ y N ) (x 2 , ■ ■ ■ x N , xi\y 2 , ■ ■ ■ Vn, Vi)- 
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If L is the NTRU lattice, then <j(L) = L. Furthermore, (cr(m), cr(s)) follows 
the same distribution as uniformly random (m, s). So, any pair (m, s) gives rise 
to N parallelepiped samples. This technique also allows a IV-factor speedup for 
covariance computation, which is the most time consuming part of the attack. 

2.5 Countermeasures 

NTRUSign perturbation: Summing Parallelepipeds. Roughly speaking, these 
techniques perturbates the hashed message m before signing it with the NTRU 
secret basis. More precisely, the hashed message m is first signed using a second 
NTRU secret basis (of another NTRU lattice, which is kept secret), and the 
resulting signature is then signed as before. Heuristically, the effect on the sample 
distribution of the transcript is as follows: if R and R' are the two secret bases, 
the distribution of s — m becomes the convolution V(R) ®V(R ') , i.e. a natural 
distribution over the Minkowski sum of the two parallelepipeds obtained by 
adding the uniform distributions of both parallelepipeds. 

IEEE- IT perturbation: Parallelepiped Deformation. Hu et al. [Hij suggested an- 
other approach to secure NTRUSign in the journal IEEE Trans. IT. Their 
definition are specific to NTRUSiGN-bases, but it can be generalized to GGH, 
and we call this technique “Parallelepiped deformation”. Let 6 : p/2, l / 2 ) n — > Z" 
be a function, possibly secret-key dependent. The signature generation (P is re- 
placed by: 



If 5 outputs small integer vectors, then the signature s is still valid. The associ- 
ated deformation function is dg (x) = x + <5(x). The sample distribution of s — m 
is deformed in the following way : dg(U n ) ■ R where dg{U n ) denotes the distribu- 
tion of x + 5(x) with x U n . In jTTH| . the deformation £ieee for a NTRUSign 
secret key (/, g, F, G) is as follows: 

— Let U C [IV] be the set of indexes u such that the u-th entry of f+g + F+G 
is 1 modulo 2, and let A = #U. On the average, A « N/ 2, and it is assumed 
that A > 25, otherwise a new secret key must be generated. 

— Let 1 < u\ < u% < ■ ■ ■ < ua < N be the elements of U. For i £ [A\, Ui 
denotes u ^ mo dA)- 

— Let the input of £ieee be the concatenation of two vectors x, y € p/2, 1 /2) iV . 
Then the i-th entry of #iEEE(x|y) is: 



s(x Uj , y Uj , y Uj+1 , y Uj+3 , y Uj+7 , y Uj+12 ) if i = 

r 1 if (ii < 0 for all i 
— 1 if ai > 0 for all i 
0 otherwise 


0 if ii U 


if i(fU 


where s(ao, . . . , 015) = 
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Gaussian Sampling. Gentry et al. [Hj described the first provably secure coun- 
termeasure: Gaussian sampling. In previous schemes, the distribution of s — m 
was related to the secret key. In the distribution becomes independent of 
the secret key: it is some discrete Gaussian distribution, which gives rise a to a 
security proof in the random-oracle model, under the assumption that finding 
close vectors is hard in the NTRU lattice. Unfortunately, this countermeasure 
is not very competitive in practice: the sampling algorithm [E| is much less ef- 
ficient than NTRUSign generation, and the new signature is less close to the 
message, which forces to increase parameters. But its efficiency has recently been 
improved, see j2(il(il . 

3 Learning a Zonotope: Breaking NTRUSign with 
Perturbations 

In Sect. Ifi.1l we introduce the hidden zonotope problem (HZP), which is a natu- 
ral generalization of the hidden parallelepiped problem (HPP), required to break 
NTRUSign with perturbations. In Sect. lfi.2L we explain why the Nguyen- Regev 
HPP algorithm (Alg. EJ) can heuristically solve the HZP, in cases that include 
NTRUSign, provided that Step El is slightly modified. Yet, the approximations 
obtained by the algorithm are expected to be worse than in the non-perturbed 
case, so we use a folklore meet-in-the-middle algorithm for BDD in NTRU lat- 
tices, which is described in jS|. Finally, in Sect. Ifi.fiL we present experimental 
results with our optimized NR attack which show that NTRUSign with one 
(or slightly more) perturbation (s) is completely insecure, independently of the 
type of basis. In particular, we completely break the original NTRUSign pro- 
posed to IEEE P1363 standardization |3j: only one half of the parameter sets 
was previously broken in m 

3.1 The Hidden Zonotope Problem 

Assume that one applies k — 1 NTRUSign perturbations as a countermea- 
sure, which corresponds to k NTRUSign lattices L\,. .. ,L k (with secret bases 
where only L k is public. One signs a hashed message m G Z" 
by computing si G L t such that S| - in G ^V(Ri), then S 2 G L -2 such that 
S 2 - si G jfP(Rfi), • ■ • , and finally s k G L k such that s k - s k -i G \V(R k ). It 
follows that s*; is somewhat close to m, because s k — m is in the Minkowski sum 

%P{Ri) + %P(Rn)-i 1 -\V(R k ), which is a zonotope spanned by \R\, . . . , \R k . 

And heuristically, the distribution of 2{s k — m) is the convolution of all the k 
uniform distributions D-p ( Ri j . In other words, similarly to the perturbation-free 
case, an attacker wishing to recover the secret key of a GGH-type signature 
scheme using perturbations using a polynomial number of signatures is faced 
with the following problem with m = kn: 

Problem 2 (The Hidden Zonotope Problem or HZP). Let m > n be 

integers, and V = [vi, . . . , v m ] be an m X n row matrix of rank n. The in- 
put to the HZP is a sequence of poly(n, m) independent samples from V = 
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T> Z (v) overW 1 , which is the convolution distribution over the zonotope Z(V) = 
{S"=i ®iV», — 1 < Xi < 1} spanned by V. The goal is to find a good approxima- 
tion of the rows of ±V. 

Here, we assume V to have rank n, because this is the setting of NTRUSign 
with perturbation, and because the HPP is simply the HZP with m = n. 

3.2 Extending the Nguyen-Regev Analysis to Zonotopes 

Here, we study the behavior of the original Nguyen-Regev algorithm for learning 
a parallelepiped ( SolveHPP(A), Alg. EJ) on a HZP instance, that is, when the 
secret matrix V is not necessarily square, but is an arbitrary m x n matrix of 
rank n with m > n. To do this, we need to change the analysis of Nguyen and 
Regev [231 , and we will have to slightly change Alg. El to make the attack still 
work: Alg. El is the new algorithm. Recall that the input distribution T> Z {y) is 
formed by x i v i where the xfs are uniformly chosen in [—1,1], We study 
how the two stages of the NR attack behave for T> z r v \. 

Morphing Zonotopes. We start with a trivial adaptation of Lemma Q to zono- 
topes: 

Lemma 3. Let V be an mxn matrix over R of rank n. Let G be the symmetric 
definite positive matrix V*V. Then: 

- Co v(D Z (y)) = G/ 3. 

- If L G fiC n (R) satisfies LL* = G -1 and we let C = VL, then C*C = I n and 
V z{ y) ■ L = D Z ( C y 

Lemma El shows that if we apply Morphing (A) (Alg. |TJ) to samples from T> z r v ) 
(rather than V-p (y)), the output transformation L will be such that T> Z (y) ' L is 
close to T> Z (c) for some mxn matrix C such that C t C = I n . 

In other words, the effect of Step. El in SolveHPP(A) (Alg. El) is to make 
the zonotope matrix V have orthonormal columns: V f 'V = I n . The following 
lemma gives elementary properties of such matrices, which will be useful for our 
analysis: 

Lemma 4. Let V be an m x n row matrix [vi, . . . , v m ] such that V i V = I n . 
Then: 

- IMI 2 = E£i (w, Vi ) 2 for all w e R n . 

- jj Vi || < 1 for all 1 < i < m. 

- 1 1 | Vi || 2 = n and Exp x< _ W ( Sn )(||xHW|| 2 ) = n/m. 

Learning an “Orthogonal” Zonotope. Nguyen and Regev m used the target 
function momp^fw) = E x <-x>[(x, w) 4 ] for w G §„, V = 2?-p(y) and V G C?„(R) 
to recover the hidden hypercube. We need to study this function when D is 
the zonotope distribution D = D Z ^ V) to recover the hidden zonotope. Nguyen 
and Regev gave elementary formulas for mom ©, 4 and V mom ©, 4 when V = 
T>-p(y^ and V G O n (R), which can easily be adapted to the zonotope distribution 
T> Z (y) if V*V = I„, as follows: 
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Lemma 5. Let V be amxn matrix over R such that V l V = I n , and V be the 
convolution distribution T> Z (y^ over the zonotope spanned by V. Then, for any 
w £ R n : 


momx> i4 (w) = ^ ||w|| 4 - ^ ^ (v i; w) 4 

V momi,,4(w) = ^ ^ ( v o w ) 3 v i */ w S S„ 

Corollary 1. Under the same hypotheses as Lemma 0 the minima over S n 
of the function momj^fw) are the maxima (over E> n ) of /( w) = YhLi /v;( w ) 
where f v ( w) = (v, w) 4 is defined overW 1 . 

In O Lemma 3], Nguyen and Regev used Lagrange multipliers to show that 
when V £ Cl n (R). the local minima of morri £>. p(v , )i 4 were located at ±vi, . . . , v n , 
and these minima are clearly global minima. However, this argument breaks 
down when V is a rectangular to x n matrix of rank n such that V i V = I„. To 
tackle the zonotope case, we use a different argument, which requires to study 
each function f Vi ( w) = (vj,w) individually: 

Lemma 6. Let v 6 R n and f v ( w) = (v, w} 4 for w £ R n . Then: 

1. The gradient and Hessian matrix of f v are V/ v ( w) = 4(w,v} 3 • v and 
H /v(w) = 12 (w, v) 2 • v*v. 

2. There are only two local maxima of f v over §„, which are located at ±v/||v||, 
and their value is || v|| 4 . 

3. The local minima of f v over S n are located on the hyperplane orthogonal to 
v, and their value is 0. 

4- The mean value of f v over §„ is 3||v|| 4 /(n(n + 2)). 

This already gives a different point of view from Nguyen and Regev in the special 
case where V £ (9 n (R): for all 1 < j < n, v, is a local maximum of f Vj and a 
local minimum of f Vi for all i j because v, _L v ? - ; and therefore ± v 4 , . . . , v n 
are local extrema of momy.^ 4 . 

In the general case where V is an to X n matrix such that V l V = I n , let 
dj = Vj/||vj|| £ S„ for 1 < i < to. The direction dj is a local maximum 
of / V; , over S n . On the other hand, / v , (d ? ) is likely to be small for i / j. 
This suggests that d ? should be very close to a local maximum of the whole 
sum YhLi /vi(dj), provided that the local maximum ||vj || 4 of f Vj is somewhat 
larger than Ylijtj /v« (dj). In fact, this local maximum dj is intuitively shifted by 
g/(2||vj|| 4 ) where g is the gradient of YiLi /v*(dj) at dj , because this is exactly 
what happens for its second-order Taylor approximation. This is formalized by 
our main result, which provides a sufficient condition on V guaranteeing that a 
given direction Vj/||Vj|| is close to a local minimum of momp z(ir)i 4 : 


444 L. Ducas and P.Q. Nguyen 


Theorem 3 (Local Minima for Zonotopes). Let V be am X n matrix over 
R such that V l V = I n . Assume that there is a > 1 such that V is a-weakly- 
orthogonal, that is, its m rows satisfy for all i ^ j: |(vj, Vj)| < a ||vj|| ||vj|| /\fn. 
Let 1 < j < m and 0 < e < 1/ \/2 such that: 


>6 (^ +e ) £ + F‘F ll ^ (VjV 


which holds in particular if || 


I > 

1 - ^ A2 ' 


(3) 


■ < l/\/2. Then, 


' v^lM 

over the unit sphere, the function momp z(vr)i 4 has a local minimum at some 
point m^ £ §„ such that m^ is close to the direction of Vj , namely: 


) > 1 — — and 


And the local minimum momD 2(v)] 4 (m 3 ') discloses an approximation of ||vj||, 
namely: 


,4(mj) - ( n - 


1 2 1 1 v j 1 1 


< — 5e d + 6e 2 + 4e + : 



Proof. (Sketch of the proof in 0) Let B = (w e §„ : ||w — dj|| < e} be the 
open ball of S„ of radius e, where dj = Vj / ||vj || £ 8„ Notice that for all w £ S„: 

llw - djf = ||w|| 2 + \\djf - 2 (w, dj) = 2(1 - (w, d,)). 


Therefore B = {w £ §„ : (d 7 , w) > 1 — e 2 /2}, whose closure and boundary are 
denoted respectively by B and OB. Recall that / = Y^iLi /v, ■ We will prove the 
following property: 

Vw € dB, /( w) < f{dj), (4) 

which allows to conclude the proof of Th.0 Indeed, by continuity, the restriction 
of / to B has a global maximum at some point n ny £ B. And (0) implies that 
m, (f dB, therefore m, £ B. Thus, m is a global maximum of / over the 
open set B: in other words, rn 7 is a local maximum of /, and therefore a local 
minimum of mom-p^. Furthermore, by definition of B, we have: ||mj — d :l 1| < e 
and (dj , mj) > 1 — e 2 /2. And the final inequality follows from: 

momp.iK) - Q - ) m , dj) 4 - (v^m/ - ^ (v*, mj} 4 

We now prove Let w £ dB. To show f(dj) — /( w) > 0, we decompose it as: 

(/v, (dj ) - f Vj (w)) + 5Z (/v« (di) - f Vi (w)) (5) 

¥4 
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On the one hand, the left-hand term of ©) is: 

/v, (d j) - fvj (w) = || Vi || 4 - (1 - y) 4 || Vi || 4 > £ 2 || Vi || 4 (6) 

because e < 1/V2- On the other hand, we upper bound the right-hand term 
of (0 by the Taylor-Lagrange formula, which states that there exists 6 £ (0, 1) 
such that (/vi( w ) - fvi(dj)) is equal to: 

(E V/v*(d,),w - d,} + i(w - dj) E H /v<(dj + 0(w - d j ))(w - d# (7) 

Wi / 

Let g = V/ Vi (dj) = 4 (d ? , v,;) 3 v,; by Lemma El The left-hand term 
of ( 0 ) is bounded as: 


£V/ Vi ( dy), 



<e||g||. 


Using Lemma El the right-hand term of (0) can be bounded as: 


(8) 


(w 


d j )^ H / Vi (d i +%-d i ))(w 


djY < 12(a/y/n + e) 2 e 2 . 


(9) 


Collecting ©), 0, © and Q). we obtain: 

/( d i) - /( w ) > (e ll v il| 4 - llsll - 6 (a/Vn + £) 2 e) £, 

which is > 0 by ©). To conclude, it remains to prove that ©) is satisfied when 
||vj| > ^f/1- and e = 1 | 4 < l/-\/ 2 . This is shown by tedious computations, 

using weak-orthogonality and Lemma © □ 

Th. ©states that under suitable assumptions on V (which we will discuss shortly), 
if 1 1 Vj 1 1 is not too small, then the secret direction Vj/||vj|| is very close to a local 
minimum of mom£> z(v)i 4 , whose value discloses an approximation of || v, || , be- 
cause it is « | — ^||vj|| 4 . This suggests SolveHZP(A) (Alg. 0) for learning a 
zonotope: SolveHZP(T) is exactly SolveHPP(T) (Alg. 0, except that Step© 
of SolveHPP(A) has been modified, to take into account that || v ? || is no longer 
necessarily equal to 1, but can fortunately be approximated by the value of the 
local minimum. 

First, we discuss the value of a in Th. © . Note that weak-orthogonality is a 
natural property, as shown by the following basic result: 

Lemma 7. Let v e S" and denote by X the random variable X = (v, w) 2 where 
w has uniform distribution over § n . Then X has distribution Beta(l/2,(n — 
l)/ 2 ), Exp(X) = 4 , Exp(X 2 ) = ra / n 3 +2 \ , Exppf 3 ) = w („ + 2) 5 ( ra+ 4) an d more 
generally: Exp(X fe ) = J/2+I-1 Exp^k -1 ). 
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Algorithm 4. SolveHZP(T): Learning a Zonotope 

Input: A set X of vectors x 6 R n sampled from T> Z (y), where V is an m x n matrix 
of rank n. 

Output: An approximation of some row vector of ±V. 

1: L := Morphing(A) using Alg. Q 
2: X := X ■ L 

3: Pick w uniformly at random from § n 

4: Compute r := Descent(A, w, 5) £ § n using Alg. 0 use S = 3/4 in theory and 
5 = 0.7 in practice. 

5: Return Ari -1 where A = ((| — mom^fr))^) 1 / 4 


By studying more carefully the Beta distribution, it is possible to obtain strong 
bounds. For instance, Ajtai JU Lemma 47] showed that for all sufficiently large 
n, if v e S" is fixed and w has uniform distribution over then | (v, w) | < 

(log n ) / yfn with probability >1 (iogip/ 2-1 • Since the probability is subex- 

ponentially close to 1, this implies that if m = n°^ and we assume that all 
the directions Vj/||vj|| are random, then V is (log n)- weakly orthogonal with 
probability asymptotically close to 1. 

This gives strong evidence that, if m = n 0 ^\ the assumption on V in Th. E3 
will be satisfied for a = log n. We can now discuss the remaining assumptions. 
If a = log n, we may take any index j such that ||vj|| > f?(l/n 13 ): in particular, 
^ INI = we may take e = O (log 3 ri)/y/n. And higher values of a can 

be tolerated, as while as a = o(n 1//6 ). Now recall that Yl'lLi ll v i|| 2 = n > fi 1118 
max, || Vi || > y/njrri and || v* || is on average ^Jn/m. In particular, if the number 
of perturbations is constant, then m = 0(n ) and max, ||v, : || > i?(l), therefore 
Th. 0 applies to at least one index j , provided that a = o(nV 6 ). In fact, one can 
see that the result can even tolerate slightly bigger values of m than <9(n), such 
as m = o(n 7 / 6 / log n). 

While Th. 0 explains why SolveHZP(T) (Alg. 0) can heuristically solve the 
HZP, it is not a full proof, as opposed to the simpler parallelepiped case. The 
obstructions are the following: 

— First, we would need to prove that the distance is sufficiently small to enable 
the recovery of the original zonotope vectors, using an appropriate BDD 
solver. Any error on Vj/||vj|| is multiplied by L _1 ||vj||. In (7o| . the error on 
Vj could be made polynomially small for any polynomial, provided that the 
number of samples was (polynomially) large enough. But e cannot be chosen 
polynomially small for any arbitrary polynomial in Th. 0 

— Second, we would need to prove that Descent (T, w, 5) (Alg. |2I) finds a ran- 
dom local minimum of momp z(y)i 4 in polynomial time, even in the presence 
of noise to compute momE> z(v)i 4 . Intuitively, this is not unreasonable since 
the function momx > z(v)l 4 is very regular, but it remains to be proved. 

— Finally, we would need to prove that there are no other local minima, or at 
least, not too many of them. 
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Regarding the third obstruction, it is easy to prove the following weaker state- 
ment, which implies that global minima of mom^^,^ over the unit sphere are 
close to some direction Vj/||vj||: 

Lemma 8. Let V be amxn matrix over R such that V t V = I n , and V be the 
distribution T>z(v)- Let w be a global maximum of f{ w) = YliLi fvi{ w ) over 
Then there exists j 6 {1, , m} such that: ^ ^ < 1. 


3.3 Experiments 

We now report on experiments with the attack performed on NTRUSign, with 
n up to 502. Our experiments are real-world experiments using signatures of 
uniformly distributed messages. 

Conditions of Th. d Our discussion following Th. d suggested that the matrix 
V should be heuristically weakly-orthogonal for a = log n. In practice, we may 
in fact take a « 5 for both types of NTRUSign secret bases. 

Regarding the norms ||vj|| after morphing, we experimentally verified that 
|| Vi || y/ljk where k is the number of perturbations for NTRUSign transposed 
bases (see 0), as expected by Yl'iLi ll v »ll 2 = n - But for the so-called standard 
bases, the situation is a bit different: half of the ||vj||’s are very small, and the re- 
maining half are close to yj2/k. This can be explained by the fact that standard 
bases are unbalanced: half of the vectors are much shorter than the other vectors. 

For a number of perturbations < 8, we experimentally verified that the “gradi- 
ent” g = pjjp|| ( v j, v i) 3 Vj || appearing in the conditions of Th. d satisfies 
||g|| = 0(l/n) with a small constant < 4 (see d)- 

To summarize, the conditions of Th. dare experimentally verified for a number 
of perturbations < 8: for all vectors v/s in the case of transposed bases, and for 
half of the vectors Vj ’s in the case of standard bases. 

Modifications to the original NR attack. We already explained that the orig- 
inal NR algorithm SolveHPP(T) (Alg. d) had to be slightly modified into 
SolveHZP(T) (Alg. more precisely, Step d is modified. 

However, because Th. d states that the secret direction might be perturbed 
by some small e, we also implemented an additional modification: instead of the 
elementary BDD algorithm by rounding, we used in the final stage a special BDD 
algorithm tailored for NTRU lattices, which is a tweaked version of Odlyzko’s 
meet-in-the-middle attack on NTRU described in BU Details are given in d 

Practical cryptanalysis. We first applied successfully the optimized NR-attack on 
the original NTRUSlGN-251 scheme with one perturbation (which corresponds 
to a lattice dimension of 502), as initially submitted to the IEEE P1363 standard: 
about 8,000 signatures were sufficient to recover the secret key, which should be 
compared with the 400 signatures of the original attack |23| when there was no 
perturbation. This means that the original NTRUSign- 251 scheme m is now 
completely broken. 
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Furthermore, we performed additional experiments for varying dimension and 
number of perturbations, for the parameters proposed in the latest NTRU ar- 
ticle ca. where transposed bases are used. Table 1 summarizes the results ob- 
tained: each successful attack took less than a day, and the MiM error recovery 
algorithm ran with less than 8Gb of memory. 

Table 1. Experiments with the generalized NR-attack on the latest NTRUSign pa- 
rameters E31 


Security level : dimension n 

Toy : 94 

80-bit : 314 

112-bit : 394 

128-bit : 446 

0 perturbation 

1 perturbation 

2 perturbations 

3 perturbations 

4 perturbations 

300:(0,1) 
1000: (1,2) 
10000: (5,3) 
12000:(5,4) 
100000:(0,1) 

400:(0,1) 
5000:(0,1) 
12000: (0,2) 

400:(0,1) 
4000: (0,1) 

600: (0,1) 
4000: (0,0) 


In this table, each non-empty c 


1 represents a successful attack for a given transposed 
basis (the column indicates the security level and the dimension) and number of per- 
turbations (row). These cells have the form s : (e = |e£'|| 1 ,w = ||«o||^Jj where s is 
the number of signatures used by the learning algorithm, and where (epee) is the 
error vector of the best approximation given by a descent. The running time of our 
MiM- Algorithm is about (n/2)F/ 2 l +1 for such small w. 


Our experiments confirm our theoretical analysis: NTRUSign with a constant 
number of perturbations is insecure, but we see that the number of signatures 
required increases with the number of perturbations. 


4 Learning a Deformed Parallelepiped: Breaking the 
IEEE-IT Countermeasure 

In this section, we show that the deformation suggested in m is unlikely to prevent 
the NR attack |2Bj • More generally, we show that the NR attack heuristically still 
works if the deformation is only partial, which means that it preserves at least one 
of the canonical axes, namely there exists at least one index i such that: 

- for all x e p/a, V 2 )”) [<K x )]i = 0 

— <f(x) is independent of Xi : (Vj ^ i,Xj = yj) =>■ 5(x) = <J(y) 

Such an index i is said to be ignored by the deformation 8. And it is clear that 
<5ieee is partial by definition (see Sect. El, because it ignores exactly all index 
i . Our main result is the following, whose proof is given in 
Theorem 4. Let 8 be a partial deformation, and i be an index ignored by 5. Let 
V = 2-ds(U n ) and M e QC n (R) be an invertible matrix and G = Co v(D-M). Let 
L be such that LL 4 = G _1 . Then r = -^•m iL is a local minimum o/momj^^) 
over the unit sphere, where D' =T> ■ M ■ L. 

While this is a strong theoretical argument supporting why the NR attack still 
works, it is not a full proof, for reasons similar to the zonotope case (see the 
previous section): there may be other minima, and we did not prove that the 
gradient descent efficiently finds minima. 
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Experimental results. The attack was run, using 300,000 signatures, to recover 
the secret key in 80-bit, 112-bit and 128-bit NTRUSign security level settings, 
and each run led to a secret key recovery, in about two days. No other local min- 
imum was found. Though the samples no longer belong to a set stable by NTRU 
symmetry group 0^ TRU , we may still try to apply the symmetry trick, to multi- 
ply the number of samples by N, like in m- This modifies the distribution of the 
sample to the average of its orbit : 6^ TRU (D) = tr(x) : x «— V, a <— W(6>( TRU ). 
It turns out that applying the attack on such an averaged distribution leads 
once again to descents converging to some basis vectors: in fact, by symmetry, 
all of them are equally likely. The attack used 2,000 signatures, and ran in less 
than an hour, on the same basis. Intuitively, this averaging strongly reduces 
the co-dependence between the coordinates of x <— D a . making the resulting 
distribution much closer to a parallelepiped than V. 
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Abstract. In the last two decades, many computational problems aris- 
ing in cryptography have been successfully reduced to various systems 
of polynomial equations. In this paper, we revisit a class of polynomial 
systems introduced by Faugere, Perret, Petit and Renault. Based on new 
experimental results and heuristic evidence, we conjecture that their de- 
grees of regularity are only slightly larger than the original degrees of 
the equations, resulting in a very low complexity compared to generic 
systems. We then revisit the application of these systems to the ellip- 
tic curve discrete logarithm problem (ECDLP) for binary curves. Our 
heuristic analysis suggests that an index calculus variant due to Diem 
requires a subexponential number of bit operations 0(2 cn ,3 log "') over 
the binary field F2», where c is a constant smaller than 2. According 
to our estimations, generic discrete logarithm methods are outperformed 
for any n > N where N « 2000, but elliptic curves of currently rec- 
ommended key sizes (n ~ 160) are not immediately threatened. The 
analysis can be easily generalized to other extension fields. 


1 Introduction 


While linear systems of equations can be efficiently solved with Gaussian elim- 
ination, polynomial systems are much harder to solve in general. After their 
introduction by Buchberger na> Grobner bases have become the most popular 
way to solve polynomial systems of equations, in particular since the develop- 
ment of fast algorithms like f 4 m and F 5 m Polynomial systems arising in 
cryptography tend to have a special structure that simplifies their resolution. 
In the last twenty years, many cryptographic challenges have been first reduced 
to polynomial systems of equations and then solved with fast and sometimes 
dedicated Grobner basis algorithms jj 
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Our Contribution 

In this paper, we revisit a particular class of polynomial systems introduced by 
Faugere et al. l.'fdl.TH . These systems naturally arise by deploying a multivariate 
polynomial equation over an extension field into a system of polynomial equa- 
tions over the ground prime field (a technique commonly called Weil descent). 

We first observe that polynomial systems arising from a Weil descent are a 
natural generalization of a well-known family of polynomial systems appearing 
in the cryptanalysis of HFE |4 81-1 211 8ld( )ld8l24ll 0122122)1 . Starting from this ob- 
servation, we extend various experimental and theoretical results on HFE to 
the more general class of polynomial systems arising from a Weil descent. Our 
results suggest that the degrees of regularity of these systems are only sligthly 
larger than the degrees of their equations, essentially as small as they could be. 

Following m, we subsequently study an elliptic curve discrete logarithm al- 
gorithm of Diem m in the case of binary fields. Based on our heuristic analysis 
of polynomial systems arising from a Weil descent, we conjecture that the ellip- 
tic curve discrete logarithm problem can be solved over the binary field F- 2 » in 
subexponential time 0{ 2 cn2/3losn ), where c is a constant smaller than 2. For n 
prime, this problem was previously thought to have complexity 0( 2"/ 2 ). 

Our analysis of polynomial systems arising from a Weil descent can also be 
applied to the factorization problem in SL( 2,F 2 n), to HFE and to other discrete 
logarithm problems. These applications will be discussed in an extended version 
of this paper ■ Although we focus on characteristic 2 in this paper, most of 
our results can be easily extended to other characteristics. 


Outline 

The remaining of this paper is organized as follows. Section |2| contains most of 
the notations and definitions used in the paper. Sectional provides general back- 
ground on algebraic cryptanalysis with Grobner bases. Section 0 contains our 
new analysis of polynomial systems arising from a Weil descent. The application 
to Diem’s algorithm is detailed in Section 0 and Section El concludes the paper. 

2 Definitions and Notations 

We mostly follow the notations introduced in 123- For any “small” prime p and 
any n € Z, we write F p n for the finite field with p n elements. We see the field 
F p ™ as an n-dimensional vector space over F p and we let {6 \, . . . , 0 rl } be a basis 
for F p n /F p . With some abuse of notations, we use bold letters for all elements, 
variables and polynomials over F p n and normal letters for all elements, variables 
and polynomials over F p . If x \ , . . . , xn are variables defined over a field K, we 
write R := K[xi, . . . , Zjv] for the ring of polynomials in these variables. Given 
a set of polynomials f ±, . . . , ft G R. the ideal I(f \ c R. is the set of 
polynomials JT =1 gif), where, g-[. . . . ,g/ G R. We write Res ;Ci (/i , fo) for the 
resultant of / 1 J 2 G iJ with respect to the variable x t . A monomial of R is a 
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power product n*=i X T where e 4 G N. A monomial ordering for R is an ordering 
> such that mi > m 2 => m\m^ > TO 2 m 3 for any monomials mi, m 2 , m 3 and 
to > 1 for any monomial m. The leading monomial LM (/) of a polynomial f £ R 
for a given ordering is equal to its largest monomial according to the ordering. Its 
leading term is the corresponding term. For any polynomial f £ R, we denote the 
set of monomials of / by Mon(/). We measure the memory and time complexities 
of algorithms by respectively the number of bits and bit operations required. 
Actual experimental results are given in megabytes and seconds. We write O for 
the “big O” notation: given two functions / and g of n, we say that / = 0 (g) 
if there exist N,c £ Z+ such that n > N =>■ f(n) < cg(n). Similarly, we write 
o for the “small o” notation: given two functions / and g of n, we say that 
f = o(g) if for any e > 0, there exists N £ Z such that for any n > N, we have 
|/(n)| < e|g(n)|. Finally, we write lo for the linear algebra constant. Depending 
on the algorithm used for linear algebra, we have 2.376 < u < 3. 

3 Background on Polynomial System Resolution 

Let R be a polynomial ring and let > be a fixed monomial ordering for this ring. 
A Grobner basis jl3ill)j of an ideal I(f%, . . . ft) C R is a basis {/(, . . . , f' e ,} of 
this ideal such that for any / £ /(/ 1 , . . . ft), there exists i £ (1, . . . , £'} such that 
LT(/|)|LT(/). The first Grobner basis algorithm was provided by Buchberger in 
his PhD thesis m Lazard mi later observed that computing a Grobner basis 
is essentially equivalent to performing linear algebra on Macaulay matrices at a 
certain degree. 

Definition 1 (Macaulay Matrix ) . Let R be a polynomial ring over 

a field K and let 3d := {toi > m 2 > ■ ■ ■ } be the sorted set of all monomials 
of degree < d for a fixed monomial ordering. Let F := {fi,...,fe} C R. be 
a set of polynomials of degrees < d. For any fi £ F and tj £ 3d such that 
deg (fi) + deg (tj) < d, let gij := tjfi and let £ K be such that gij = 
J2m k eB c i,j m k- The Macaulay matrix Md{F) of degree d is a matrix containing 
all the coefficients c^j, such that each row corresponds to one polynomial g^j 
and each column to one monomial m k £ 3d- 

The idea behind Lazard’s observation is linearization: new equations for the 
ideal are constructed by algebraic combinations of the original equations, every 
monomial term appearing in the new equations is treated as an independent new 
variable, and the system is solved with linear algebra. Grobner basis algorithms 
like F 4 and F 5 |2Z| successively construct Macaulay matrices of increasing 
sizes and remove linear dependencies in the rows until a Grobner basis is found. 
Moreover, they optimize the computation by avoiding monomials tj that would 
produce trivial linear combinations such as / 1/2 — / 2/1 =0. The complexity of 
this strategy is determined by the cost of linear algebra on the largest Macaulay 
matrix occuring in the computation. 

The degree of the largest Macaulay matrix appearing in a Grobner basis 
computation with the algorithm F5 is called the degree of regularity D reg . For 
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a “generic” sequence of polynomials € R (with £ < n), this degree 

is equal to 1 + Ei=i(deg(/i) — 1) j[6|. The degree of regularity can be precisely 
estimated in the case of regular and semi-regular sequences |6l8j and (assuming 
a variant of Froberg conjecture) in a few other cases |2!SI I I j . However, precisely 
estimating this value for other classes of systems (in particular for the various 
structured systems appearing in cryptanalysis problems) may be a very difficult 
task. In practice, the degree of regularity may often be approximated by the 
first degree at which a non trivial degree fall occurs during a Grobner basis 
computation. 

Definition 2. Let R be a polynomial ring over a field K and let F := {/i, . . . , ffi\ 
C R. The first fall degree of F is the smallest degree Dfi rst f a u such that there 
exist polynomials gt £ R with maxj(deg(/i) + deg(<?i)) = Dfi rs tf a u, satisfying 
de g(Ei=l 9ifi) < Dfirstfau but Ei=l 9ifi f 0- 

We have D reg > Df irst f a u. For many classes of polynomial systems, the two defi- 
nitions lead to very close numbers. Although this is not true in general (counter- 
examples can be easily produced), it seems to be true for “random systems” 
and “most real-life systems of equations” |3B1 p. 350] including HFE and its 
variants |3( 13882402 212311 1 j . This can intuitively be explained by the observation 
that an extremely large number of relations with a degree fall occur at the degree 
Dfirstfau or the degree Dfirstfau - hi in these contexts, and these low degree rela- 
tions can in turn be combined to produce lower degree relations |23J p. 561], until 
a Grobner basis is finally found. In fact, the first fall degree has even sometimes 
been called degree of regularity in the cryptography community |24I22I23| . 

Many polynomial systems arising in cryptanalysis are very far from 
generic ones. In fact, their special structures often induce lower degrees of reg- 
ularity, hence much better time complexities. Grobner basis techniques have suc- 
cessfully attacked many cryptosystems, including HFE and its variants 
[48i42lfi( 1.4811 ( 122E3j , the Isomorphism of Polynomials |32ll 2] and some McEliece 
variants m\- In many cases, the resolution of these systems could be accelerated 
using dedicated Grobner basis algorithms that exploited the particular struc- 
tures. As was first pointed out in jTlI.'i-lj . this is also the case for polynomial 
systems arising from a Weil descent. 

4 Polynomial Systems Arising from a Weil Descent 

Let n, n', m be positive integers and let V be a vector subspace of F 2 « /F 2 with 
dimension n'. Let f £ F- 2 » [xi, . . . , x m ] be a multivariate polynomial with de- 
grees bounded by 2* — 1 with respect to all variables. In |44I44| . Faugere et al. 
considered the following problem: 

Find Xi £ V, i = 1, . . . , m, such that f(xi, . . . , x m ) = 0. (1) 

The constraints Xi £ V. i = 1 are called linear constraints. From now 

on, we assume that mn! « n such that Problem (JU has about one solution on 
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average. We also assume n! > t. The multilinear case (t = 1) was first considered 
in E3 and later extended in m- 

Following [4 41 .41 [ , Problem 0 can be reduced to a system of polynomial equa- 
tions. Let { 6i , . . . , 6 n j be a basis of F 2 n over F 2 and let (vi|i = 1, . . . , n'} be a ba- 
sis of V over F 2 . We define m-n' variables xij over F 2 such that Xi = Y^j = i Y j 
and we group them into m blocks of variables Xi := {xij\j = 1, . . . , n'|. By 
substituting each x ; in f, decomposing in the basis {6i,...,0 n } and reduc- 
ing by the field equations xfj — = 0, we obtain 0 = f(xi,...,x m ) = 

f (E"= i Yj, ■ ■ • , E"=i ®mjv j) = [f]f 0i + . . . + [f]+ 6 n for some [f]f , . . . , [f]+ G 
F 2 [xh, . . . , x mn >] that depend on f and on the vector subspace V. Problem 0 
can therefore be reformulated as finding a solution to the (algebraic) system 

[f] i = 0, . . . , [f]^ = 0. (2) 

Due to the bounds on the degrees of f, this system has a block structure: the 
degrees of all polynomials [f]ji are bounded by t with respect to all blocks of 
variables. The resolution of System (0 can therefore be greatly accelerated using 
block-structured Grobner basis algorithms j'2l)l.44l.4l[ . 


Link to HFE. In this paper, we observe that a particular instance of Prob- 
lem 0 had previously been studied in the cryptography literature. Indeed, the 
well-known problem of inverting HFE [4 SI.' -11 )I4!S] leads to a particular instance 
of System 0, where the polynomial f is univariate (m = 1) and the linear con- 
straints are trivial ( V = F 2 n )Q Interestingly, although the polynomial f used in 
HFE has a particular shape (it leads to quadratic equations over F 2 ), we will see 
that this shape has generically little influence on the complexity of Problem 0 . 

Ten years of research on HFE systems have shown that their degrees of regu- 
larity are abnormally low compared to generic systems, resulting in very efficient 
attacks. Although no definitive proof of these results has been published yet, the 
experimental observations of m are now being supported by theoretical evi- 
dence such as the isolation of a subsystem with less variables 0H| ■ the existence 
of many low degree equations na, first fall degree computations [22I24| and 
complexity results on the MinRank problem m In this paper, we generalize 
some of these results to polynomial systems arising from a Weil descent. 


Experimental Observations. We start our analysis of these systems with an 
experimental study of their degree of regularity for various parameters n, m, n', t. 
For each set of parameters, we generate a random vector space V of dimension 
n! and a random multivariate polynomial f(xi, . . . ,x m ) with degree bounded 
by 2* — 1 with respect to each variable. We then perform a Weil descent on this 

1 In HFE contexts, the attacker is not given f but only a “hidden” version of Sys- 
tem 0 . This can be ignored in the complexity analysis of Grobner basis algorithms 
since the hiding transformation only consists of a linear combinations of the equa- 
tions and a linear change of variables |1 81.481 . 
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polynomial and we append the field equations to the system. Finally, we apply 
the Magma function Groebner to the result and we collect the maximal degree D 
reached during the computation, as given by the Verbose output of the Magma 
function. We repeat each experiment 100 times. 


Table 1 . Average maximal degree reached in Grobner Basis experiments, average 
computation time (in seconds) and maximal memory requirements (in MB) for random 
polynomials 



Table |T| reports the average value of D for these experiments, as well as the 
average computation time and the maximal memory used (all experiments were 
done on an Intel Xeon CPU X5500 processor running at 2.67 GHz, with 24 GB 
RAM). As is often the case in Grobner basis computations, our experiments were 
limited more by the memory requirements than by the computation time. 

For all parameter sets, the maximal degrees occuring during Grobner basis 
computations were much smaller than the degrees of regularity of regular or 
semi-regular systems with the same degrees. In fact, our experiments suggest 
that the degree of regularity of System Q is not much higher than the value 
mt + 1. In other words since the original equations have degree mt, the degree of 
regularity is essentially as small as it could be. The even lower values obtained 
for all parameter sets such that t = n' can be explained by a probable degeneracy 
in the degrees of the equations. Taking m = 1, we recover known experimental 
results on HFE |3D| ■ 

Heuristic Upper Bound on D rf , g . As a first step towards explaining these 
experimental results, we follow Granboulan et al-BBI and we bound of the degree 
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of regularity of System (0) from above by the degree of regularity of a smaller 
system with a lower number of variables. We now suppose that {6 \, . . . , 0 n } is a 
normal basis of F 2 ^ over F 2 , such that Oi := 9 2 3 ' for some 6 e F 2 ". Let Vij e F 2 
such that Vi = J2j=i v ij@j- We define rim auxiliary binary variables yij such 
that Xi = VijOj- Proceeding to a Weil descent as above, we obtain a new 
systenQ 

[f]t“ =0,...,[ffr =0 (3) 

in the variables y^, to which we add m(n + n') field equations y 7 2 - — y l3 = 0 and 
xfj — Xij = 0, as well as mn linear equations y,j = Y^k-i x ikVkj modeling the 
linear constraints. The resulting system of m(n + n') variables and n + m(n + 
n') + mn equations is equivalent to System 021 (with the field equations), hence 
they have the same degree of regularity. 

Following Granboulan et al. ©3 , we perform additional modifications on this 
system to obtain a new system with less variables and higher or equal degree of 
regularity. First, we observe that linear equations do not contribute to the degree 
of regularity and can therefore be removed without affecting it. The resulting 
system is composed of n + mn equations containing only the variables y l: j and 
mn' field equations = 0. Without decreasing the degree of regularity, 

we can focus on the first part containing Equations © and the field equations 
Vij ~ Vij = °- 

In the next step, we observe that the degree of regularity of this system is 
not affected if we see the variables y l: j over F 2 ** rather than over F 2 . Thanks 
to the field equations, the set of solutions is not affected by this change either. 
We then apply an invertible linear transformation on Equations © , defined by 
Fi := @ 2 ’ + * [f]} w for i = 1, • • • , n. This transformation implies F) = F 2 ’ . 

Finally, we perform a linear change of variables defined by z# := J2k-i ^ 2 ’ +k 1 Vik 
for i = 1, . . . , m, and j = 1 . ■ • • , n. Since this corresponds to setting zn = x ; , 
Zi 2 = Xi 2 , . . . , Zi. n = Xi 2 , each F^ only depends (linearly) on z t j, k < j < 
t+k— 1. A last linear transformation changes the field equations into zf 7 = Zij + 1 
and zf n = Zi, 1 . 

Since F% — F\ ■ F\ modulo the field equations, the polynomial F 2 can be 
expressed at the degree 2 mt as an algebraic combination of Fi and the field 
equations. Similarly, all polynomials E), i > 2 can be recovered at degree 2 mt 
from algebraic combination of F\ and the field equations. Therefore, the degree 
of regularity of the original system is smaller than the maximum of 2 mt and 
the degree of regularity of the system {F\ = 0; 2 , 2 = Zjj+i, i = 1 , ,m,j = 

1. . . . , n — 1; z 2 n = Zi,i,i = 1, . . . , m}. Finally like EE!, we bound this last de- 
gree by the degree of regularity of the subsystem {Fi = 0; zf 7 = Zij + i,i = 

1. . .. ,m,j = 1 , . . . , t — 1}. Assuming that this system behaves like a generic 
system with the same degrees and the same number of variable^, its degree of 


2 We add a subscript y to the arrows in System © to stress that the Weil descent is 
done on the yy variables and to distinguish this system from System © . 

3 A similar assumption of semi-regularity is needed in ©I to apply Bardet’s theorem. 


458 C. Petit and J.-J. Quisquater 


regularity can be bounded by m(2t — 1) using Macaulay’s bound. Under this 
heuristic assumption, we conclude that the degree of regularity of System 0 is 
bounded by 2 mt. 

We point out that the value 2mt is already much below the degree of regularity 
of a generic system of equations (or even a generic binary system of equations) 
with the same degrees 071 - Still, our experiments suggest that this bound is 
not even tight. A tighter bound can be obtained with a seemingly stronger (yet 
“classical”) heuristic assumption. 


First Fall Degree. An important characteristic of HFE systems is the existence 
of many algebraic combinations of the equations that have a degree lower than 
it would be expected for a generic system. Similar low degree equations were 
identified for System 0. More precisely, Faugere et al. jTil.Tlj showed that for 
any monomial m e F 2 n [xi, . . . , x„/], the equations obtained by applying a Weil 
descent on the polynomial mf are algebraic combinations of the equations of 
System 0 that produce a degree fall. By the way they are constructed, the 
existence of these equations is very specific to polynomial systems arising from 
a Weil descent. For m := xi, we immediately deduce: 

Proposition 1. The first fall degree of System 0) is at most rnt + 1 . 

This proposition provides a heuristic explanation for the degrees of regularity 
observed above since the first fall degree is often a good approximation of the 
degree of regularity. As recalled in Sectional this heuristic assumption is “classi- 
cal” in algebraic cryptanalysis, and it has in particular been verified for various 
HFE- like systems j.48l24l22j . 

Assumption 1. Let n,m,t,n' 6 Z. Let f be generated as in our experiments. 
For all but a negligible fraction of the resulting systems, we have D reg = 

D first fall + o(D fi rst f a u). 

The assumption intuitively makes sense for System 0 since not only one but 
many degree falls are occuring at degree Dfi rst faii and the next ones (each 
monomial m leads to new degree falls). 


Heuristic Complexity Bounds for Problem 0. Given the degree of regu- 
larity, the complexity of Problem Q] simply follows from the cost of linear algebra. 


Proposition 2. If Assumption^ holds, Problem Q can be solved with standard 
Grobner basis algorithms (like F4 or F5) in time 0(n uD ) and memory 0(n 2D ), 
where uj is the linear algebra constant and D ps mt. 

In the univariate case, this estimation reduces to D ps t which perfectly matches 
known cryptanalysis results on HFE algebraic systems j.'iOlliSj . Interestingly, the 
special shape of HFE polynomials (they deploy to quadratic equations over F 2 ) 
seems to have no impact on the degree of regularity (although further restrictions 
on the shape may have an impact as pointed out in m)- In the multilinear case, 
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the estimation provided by Proposition [21 becomes D « rri which matches the 
experimental data of B3- 

As observed in j-l.'H.'llj , the block structure of System (j2J) can be exploited to 
accelerate its resolution. 

Proposition 3. If Assumption Q holds, Problem Q can be solved with block 
Grobner basis algorithms in time 0{{n') uD ) and memory 0{{n') 2D ), where uj is 
the linear algebra constant and D zs mt. 

Additional heuristic methods like hybrid approaches (consisting in mixing ex- 
haustive search and polynomial system resolution j22Ej) may lead to substantial 
complexity improvements in practice, as was described in for the multilinear 
case. 

5 Index Calculus for Elliptic Curves 

We now turn to the main application (so far) of Problem 0. As pointed out 
in m, an instance of Problem (0 appears in the relation search step of an in- 
dex calculus algorithm for elliptic curves proposed by Diem uni- Given a cyclic 
(additive) group G, a generator P of this group and another element Q of G, the 
discrete logarithm problem asks for computing an integer k such that Q = kP. 
Groups typically used in cryptography include the multiplicative groups of finite 
fields, groups of points on elliptic curves and hyperelliptic curves and Jaco- 
bians of higher genus curves. Index calculus algorithms |43l25j with subexponen- 
tial complexities have long been obtained for the multiplicative groups of finite 
fields [HI til2l- 7 )l!TH| and more recently for the Jacobian groups of hyperelliptic 
curves [3l3fil35j . 

In 2004, Semaev introduced his summation polynomials and identified their 
potential application to build index calculus algorithms on elliptic curves m 
over prime fields F p . These ideas were independently extended by Gaudry [T7j 
and Diem j2Hj to elliptic curves over composite fields F p » . Following this ap- 
proach, Gaudry (2Z| and later Joux and Vitse [4014 1[ obtained index calculus 
algorithms running faster than generic algorithms for any p and any n > 3. On 
the other hand, Diem [2012 1j identified some families of curves with a subexpo- 
nential time index calculus algorithm by letting p and n grow simultaneously 
in an appropriate way. As far as was known at the moment, the two families of 
elliptic curves recommended by standards m (elliptic curves over prime fields 
F p or over binary fields F 2 ** with n prime) remained immune to these attacks. 
In 2012, Faugere et al. [221 observed that the computation of the relations in an 
algorithm of Diem for binary fields El could be reduced to special instances of 
Problem 0. 


Diem’s Variant of Index Calculus. Let K be a finite field and let E be an 
elliptic curve over K defined by the equation E : y 2 + xy = a," 3 + + a fi 

for some a 2 ,a 6 £ F 2 n. Semaev’s summation polynomials S r are multivariate 
polynomials satisfying S r (xi, . . . ,x r ) = 0 for some xi, . . . ,x r £ K if and only 
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if there exist yi, . . . ,y r G K such that (xi,yi) G E(K) and (xi,yi) + ••• + 
(x r ,y r ) = -Poo m ■ The summation polynomials can be recursively computed 
as S 2 (xi,x 2 ) := x 2 + xi, S 3 (xi,x 2 ,x 3 ) := xi 2 x 2 2 + xi 2 x 3 2 + Xix 2 x 3 + 
x 2 2 x 3 2 + a 6 and for any r > 4, any k, 1 < k < r — 3 , S r (xi, . . . , x r ) := 
Res x (S r _ k (xi, . . . , x m _ k _i, X), S k + 2 (x r _k, ■ ■ ■ , x r , X)) . For r > 2, the poly- 
nomial S r is symmetric and has degree 2 r_2 in every variable Xi m- 

Summation polynomials were used by Gaudry 123, Joux and Vitse m and 
Diem |2(I21j to compute relations in index calculus algorithms for elliptic curves 
over composite fields. The following variant is an adaptation of Diem E3- 

1. Factor Basis definition. Fix two integers m,n' < n with mn' ~ n and a 
vector space V C F 2 r>/F 2 of dimension n' . Let Tv := {(x, y) G E(K ) |x G V} 
be the factor basis. 

2. Relation search. Find about 2" relations aiP + biQ = Pij with 

Pij G Ty- For each relation, 

(a) Compute Ri := ajP + biQ for random integers o», 

(b) Solve Semaev’s polynomial S m _|_i(xi, . . . , x m , ( Ri ) x ) with the constraints 
Xi G V. 

(c) If there is no solution, go back to (a). 

3. Linear Algebra. Perform linear algebra on the relations to recover the 
discrete logarithm value. 

In previous works [3712 012 1 BIT) , a Weil descent was applied to Semaev’s poly- 
nomials and the resulting systems were solved with resultants or Grobner basis 
algorithms. In these works, the complexity of the relation search step was de- 
rived from the complexity of solving generic systems. However as pointed out 
in |33I34| and further demonstrated in Sectional of the present paper, polynomial 
systems arising from a Weil descent are very far from generic ones. 


A New Complexity Analysis. We now revisit Diem’s algorithm |23 and 
its analysis by m in accordance with our new analysis of Problem (QJ. Let 
n, m, n' be integer numbers. Before starting Diem’s algorithm, the (m+l)th sum- 
mation polynomial must be computed. Using Collins’ evaluation/interpolation 
method E3 for the resultant, this can be done in time approximately 2* 1 wher^ 
t\ rs m(m+ 1). We then compute about 2 n relations. To obtain these relations, 
we solve special instances of Problem (QJ where f(xi, . . . , x m ) := S m+ i(xi, . . . , 
x m , (aiP+biQ) x ) has degree 2 m_1 with respect to every variable. Since Semaev’s 
polynomials are clearly not random ones, we perform additional experiments. 

In our experiments, we apply Diem’s algorithm to a randomly chosen binary 
curve E : y 2 + xy = x :i +a 2 x 2 +a 6 defined over F 2 n, where n G {11, 17}. We first 
fix m G {2,3} and n' := \n/rn\. We then generate a random vector space V of 
dimension n' and a random point R on the curve such that f has solutions. As 
in Section 01 we finally use the Groebner function of Magma to solve Semaev’s 

4 To compute S m+ i, we apply Collins’ algorithm on Sk where k = \ ■ This 
polynomial has degree 2 h m W/ 2 l ; n eac h variable. Following Collins, Theorem 9, 
we have t\ < 2 (m + l)m/2 = m(m + 1). 


On Polynomial Systems Arising from a Weil Descent 461 


Table 2. Average maximal degree reached in Grobner Basis experiments, average 
computation time (in seconds) and maximal memory requirements (in MB) for Semaev 
polynomials. (R): Random curves. (K): Koblitz curves. 



equation S m +i(xi, . . . , x m , R x ) = 0 with the linear constraints. We repeat this 
experiment 100 times for each parameter set, then we repeat all our experiments 
with the Koblitz curve E : y 2 +xy = a; 3 +a: 2 +l. The average value of the maximal 
degrees reached during the computation, the average computation time and the 
maximal memory requirements are reported in Table 0 

In all cases, the maximal degrees reached in the computations were even 
below the first fall degree bound given by Proposition (0). This phenomenon is 
probably due to the sparsity of Semaev’s polynomials and will be exploited in 
future work (in particular, the degree of S m+ i with respect to every variable 
is 2 m_1 but bounded by 2 m — 1 in the analysis of Section 01) ■ Prom now on in 
the analysis, we ignore this difference and analyze Semaev’s polynomials as the 
random polynomials of Section 0 

Assumption 2. Assumption^ still holds iff is generated from Semaev’s poly- 
nomials as in the experiments of this section. 

Under Assumption (0), Step 2(b) of Diem’s algorithm can be solved using a 
dedicated Grobner basis algorithm taking advantage of the block structure, in a 
time (n') uD , where D « (m 2 + 1) and ui is the linear algebra constant. Once the 
x components of a relation have been computed, the y components can be found 
by solving m quadratic equations and testing each possible combination of the 
solutions. This requires a time roughly 2 m , that can be neglected. On average, 
the probability that a point Ri := ajP + b. t Q can be written as a sum of to 
points from the factor basis can be heuristically approximated by EH- 

Assuming mn' fa n. the total cost of the relation search step can therefore be 
approximated by 2 t 2 , where t? « m log rn + n' + uj(m 2 + 1) log n' . 

The last step of Diem’s algorithm consists in (sparse) linear algebra on a 
matrix of rank about 2" with about to elements of size about n bits per row. 
This step takes a time approximately equal to mn2“ n = 2* 3 , where t$ « logm+ 
log n + oj'n' and u/ is the sparse linear algebra constant. If Assumption (0) holds 
and if mn' « n, the total time taken by Diem’s algorithm can be estimated by 
T := 2* 1 + 2* 2 + 2* 3 , where ti,t- 2 , ts are defined as above. 


On the Hardness of ECDLP in Characteristic 2. We now evaluate the 
hardness of the elliptic curve discrete logarithm problem over the field F- 2 » for 
“small” values of n. In our estimations, we use uj = log(7)/ log(2) and (J = 2. 
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Table 3. Complexity estimates for Diem’s algorithm in characteristic 2 



We consider n G {50,100,160, 200,500,1000,2000,2500,5000, 10 4 ,2 ■ 10 4 , 5 - 
10 4 , 10 s , 2 • 10 5 , 5 ■ 10 5 , 10 6 } and to G {2, ... , n/2}. For every pair of values, we 
compute values t\ , £2 and £3 as above. Finally, we approximate the total running 
time of Diem’s algorithm by 2 tmax where t rnax '■= max(£i, £ 2 , £3). For every value 
of n, Table [3 presents the data corresponding to the value to for which t max 
is minimal. We point out that the numbers obtained here have to be taken 
cautiously since they all rely on Assumption 0 and involve some approximations. 

According to our estimations, Diem’s version of index calculus (together with 
a sparse Grobner basis algorithm) beats generic algorithms for any n > N, 
where N is an integer close to 2000. An actual attack for current cryptograph- 
ically recommended parameters (n at 160) seems to be out of reach today, but 
the numbers in m suggest that medium-size parameters could be reachable 
with additional Grobner basis heuristics like the hybrid method jjjj ■ Large prime 
variations of Diem’s algorithm may also lead to substantial improvements 
in practice. This will be investigated in further work. 

Letting n grow and fixing n' := n a and to := n x ~ m for a positive constant 
a < 1, we obtain 

£1 « 

£2 a (1 — a)n x ~ a logn + n a + logn, 

£3 ~ (2 - a) log n + uj'n a 

Taking a := 2/3, the relation search dominates the complexity of the index 
calculus algorithm and we deduce the following result 0 

Proposition 4. Under Assumption HJ the discrete logarithm problem over ¥2^ 
can asymptotically be solved in time 0{ 2 c " 2/3logri ) ) where c := 2w/3 andu is the 
linear algebra constant. 


In particular if the Gaussian elimination algorithm is used for linear algebra, we 
have w = 3 and c = 2. We stress that Proposition 0 holds even when n is prime. 
Until now, the best complexity estimates obtained in that case corresponded to 
generic algorithms that run in time 2"/ 2 . 

5 Note that the weaker bound D reg < 2 mt derived in Section 21 with Macaulay’s bound 

also leads to a subexponential complexity but with a constant c = 4w/3. 
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6 Conclusion and Perspectives 

In this paper, we revisited the complexity of solving polynomial systems arising 
from a Weil descent, a class of polynomial systems previously introduced by 
Faugere et al. j.'tllll lj . We observed that these systems can be seen as natural 
extensions of HFE systems and we generalized various results on HFE. Based on 
experimental results and heuristic arguments, we conjectured that the degree of 
regularity of these systems are only slightly larger than their original degrees, and 
we deduced new heuristic bounds on their resolution. Interestingly, our bounds 
nicely generalize previous bounds on HFE. 

The most proeminent consequence of our analysis so far concerns the ellip- 
tic curve discrete logarithm problem (ECDLP) over binary fields. Indeed, our 
heuristic analysis suggests that ECDLP can be solved in subexponential time 
0(2 C " 2/3 log ”) over the binary field F 2 n, where c is a constant smaller than 2. 
This complexity is obtained with an index calculus algorithm due to Diem m 
and a block-structured Grobner basis algorithm. In practice, our estimations 
predict that the resulting algorithm is faster than generic algorithms (previ- 
ously thought to be the best algorithms for this problem) for any n larger than 
N, where N is an integer approximately equal to 2000. In particular, binary 
elliptic curves of currently recommended sizes (n « 160) are not immediately 
threatened. 

Our complexity estimates are based on heuristic assumptions that differ from 
other index calculus algorithms, but are common in algebraic cryptanalysis. The 
polynomial systems appearing in the cryptanalysis of HFE have been intensively 
studied in the last 15 years, yet we have no definitive proof for their commonly 
admitted complexity. Our paper broadens the interest of these researches to all 
polynomial systems arising from a Weil descent and to their various applications. 
We leave further experimental and theoretical investigation of our heuristic as- 
sumptions to further work. 

To conclude this paper, we point out that most of our results generalize quite 
easily to other fields, resulting in comparable asymptotic complexities. 
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Abstract. The performance of the elliptic curve method (ECM) for in- 
teger factorization plays an important role in the security assessment 
of RSA-based protocols as a cofactorization tool inside the number field 
sieve. The efficient arithmetic for Edwards curves found an application by 
speeding up ECM. We propose techniques based on generating and com- 
bining addition-subtracting chains to optimize Edwards ECM in terms of 
both performance and memory requirements. This makes our approach 
very suitable for memory-constrained devices such as graphics processing 
units (GPU). For commonly used ECM parameters we are able to lower 
the required memory up to a factor 55 compared to the state-of-the-art 
Edwards ECM approach. Our ECM implementation on a GTX 580 GPU 
sets a new throughput record, outperforming the best GPU, CPU and 
FPGA results reported in literature. 

Keywords: Elliptic curve factorization, cofactorization, addition- 
subtraction chains, twisted Edwards curves, parallel architectures. 


1 Introduction 

Today, more than 25 years after its invention by Hendrik Lenstra Jr., the elliptic 
curve method [23 (ECM) remains the asymptotically fastest integer factoriza- 
tion method for finding relatively small prime factors of large integers. Although 
it is not the fastest general purpose integer factorization method, when factoring 
a composite integer n = pq with p « qnt \fn the number field sieve |52I25| (NFS) 
is asymptotically faster, it has recently received a renewed research interest due to 
the discovery of an interesting normal form for elliptic curves introduced by Ed- 
wards E3J. From a cryptologic point of view the practical performance of ECM is 
important since it is used to rapidly factor many small (up to one or two hundred 
bits) integers inside NFS. This is illustrated by the fact that it is estimated that 
five to twenty percent (cf. Sect ion EOl whv this is hard to estimate) of the total wall- 
clock time was spent in ECM in the current world-record factorization of a 768-bit 
RSA number [2Qj (and it is expected that this percentage will grow for larger fac- 
torizations). Using ECM as a tool to factor many small numbers inside NFS is 
an active research area by itself. Offloading this work to reconfigurable hardware 
such as field-nroerammable eate arravs is studied in 15711611 1 II 7125141 II while m 
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considers parallel architectures such as graphics processing units (GPUs) and the 
Cell broadband engine architecture. A comparison between software and hard- 
ware based solutions is presented in El, Traditionally, ECM is implemented us- 
ing Montgomery curves |2S| and uses the various techniques described in jjJHl • The 
most-widely used ECM implementation is GMP-ECM pi 1| and this implementa- 
tion, or modifications to it, is responsible for setting all recent ECM record fac- 
torizations (see a description of some of these record factorizations in (Sj). After 
the invention of Edwards curves Bernstein et al. explored the possibility to use 
these curves in the ECM setting j3j. Hisil et al.Ba published a coordinate system 
for Edwards curves which results in the fastest known realization of curve arith- 
metic. A follow-up paper by Bernstein et al. discusses the usage of these “a = —1” 
twisted Edwards curves P for ECM. The speedup from switching to Edwards 
curves comes at a price, addition chains m (or addition-subtraction chains E3I) 
equipped with large windowing sizes jS| are used (cf. 0 for a summary of these 
techniques) . The memory requirement for Edwards ECM grows roughly linearly 
with the input parameters of ECM while a small constant number of residues mod- 
ulo n are sufficient when using Montgomery curves. 

In this paper we optimize ECM by exploiting the fact that the same scalar 
is often used when computing the elliptic curve scalar multiplication (ECSM), 
allowing one to prepare particularly good addition-subtraction chains for these 
fixed scalars. Our approach is inspired by the ideas used in the ECM implemen- 
tation by Dixon and Lenstra m from 1992. In jT2J the total cost to compute the 
ECSM, in terms of point doubling and point additions, is lowered by testing if 
the computation of the ECSM using batches of small prime products is cheaper 
(requires fewer point additions) than processing the primes one at a time (or all 
in one big batch). We generalize this idea: many billions of integers, which are 
constructed such that they can be computed using an addition-subtraction chain 
with a high doubling/ addition ratio, are tested for smoothness and factored. By 
fixing different popular elliptic curve scalar values used in ECM inside NFS we 
are able to combine some of these integers using a greedy approach. This results 
in a more efficient ECSM algorithm with a smaller memory footprint. To il- 
lustrate, compared to the cofactorization setting considered by Bernstein et al. 
in f- r )il| (using the parameter B t = 2 13 ) the techniques from this paper reduce 
the memory by a factor 55. This makes our approach particularly interesting 
for environments where the memory (per thread) is constrained; e.g. GPUs. We 
illustrate the practical benefits by implementing this approach for GPUs: setting 
a new throughput speed record compared to the current CPU, GPU and FPGA 
based results reported in literature. The best addition-subtraction chains found 
for the various popular B\ values can be found online p 7 . 

This paper is organized as follows. After recalling the preliminaries in Section^] 
the notation and basic idea behind elliptic curve constant scalar multiplication 
is discussed in Section 01 Section 0] explains how to combine these chains such 
that they might result in a faster and more memory efficient ECM. Section 0 
explains a side-effect why certain chains require more modular multiplications 
and Section El presents the obtained results. Section Q concludes the paper. 
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2 Preliminaries 

2.1 The Elliptic Curve Method 

The elliptic curve method (ECM) for integer factorization EH is analogous to the 
Pollard p— 1 integer factorization method ra and attempts to factor a composite 
integer n. The general idea behind ECM is as follows (we follow the description 
from El). First, pick a random point P and construct an elliptic curve E over 
Z/nZ such that P £ E(Z/nZ) (cf. [221 Sec. 2.B]). Next, compute the elliptic 
curve scalar multiplication Q = kP £ E(Z/nZ). The positive integer k is selected 
such that it is divisible by many small prime powers: e.g. k = lcm(l, 2 , . . . , B i) 
for some bound B\ £ Z. If for a prime p dividing n the order #E( F p ) is B\- 
powersmooth (an integer is defined to be B-powersmooth if none of the prime 
powers dividing this integer is greater than B) then #E( F p ) | k. In other words, 
Q = kP and the neutral element of the curve become the same modulo p. In 
this event we have p | gcd(n, Q z ), where Q z is the ^-coordinate of the point Q 
when using projective Weierstrass coordinates. If gcd(n, Q z ) ^ n then we have 
split n. 

Hasse proved (see e.g. pill Theorem 1.1]) that the order #E( F p ) is in the 
interval [p + 1 2^/p.p + 1 + 2^/p]. The advantage of ECM is that one can 

randomize the group order by trying different curves. It has been shown in El 
that the (heuristic) run-time of ECM depends mainly on p, the smallest non- 
trivial prime divisor of n, and can be expressed as 

0(exp((v / 2 + o(l))(y / IogplogTogp))M(log n)) 

where M(logn) represents the complexity of multiplication modulo n and the 
o(l) is for p —y oo. The approach described here is often referred to as “stage 1”. 
There is a “stage 2” continuation for ECM which takes as input a bound £? 2 £ Z 
and succeeds (in factoring n) if Q = kP has prime order £ (for B\ < £ < B 2 ) in 
E(F p ). This means that #E( F p ) is -powersmooth except for one prime factor 
which is below B 2 . There are several techniques [10126127] how to perform stage 
2 efficiently. In the following we will focus on stage 1 only. 

2.2 Cofactorization Using ECM 

The relation collection phase, one of the two main phases of NFS, generates a lot 
of composite integers which need to be tested for powersmoothness. This is done 
using different factorization techniques and is denoted as the cofactorization 
phase. To illustrate, the total time spent in the cofactorization procedure was 
roughly one third of the sieving time when factoring the 768-bit RSA modulus 
in EH- Note that this one third includes the time of pseudo primality tests and 
different factorization methods: quadratic sieve Eli; Pollard p — 1 EH and ECM. 
In this cofactorization phase only composites up to 140 bits were considered and 
ECM was used only for composites up to 109 bits. The parameters for ECM 
varied depending on the size of the composites and ranged from B i = 150 to 
B\ = 500 where often only a single curve was tried with a maximum of around 


470 J.W. Bos and T. Kleinjung 


Table 1 . Performance comparison between GMP-ECM and EECM-MPFQ using the 
“a = —1” twisted Edwards curves in terms of modular multiplications (M) and squar- 
ings (S) together with the required number of residues modulo n (R) which needs to 
be kept in memory. 


B1 

#s 

GMP-ECM |m 
#M #S+#M 

#R 

#s 

EECM-MPFQ 0 

#M #S+#M 

#R 

256 

1066 

2 025 

3 091 

14 

1436 

1638 

3074 

38 

512 

2 200 

4 210 

6 410 

14 

2 952 

3183 

6135 

62 

1024 

4422 

8 494 

12 916 

14 

5 892 

6144 

12 036 

134 

8192 

35 508 

68 920 

104428 

14 

47156 

45 884 

93040 

550 


eight curves. Observing the trend of past record factorizations, it is conceivable 
that cofactorization becomes more important in bigger factorizations (cf. jS] for 
more detailed arguments about the significance of ECM in NFS) . 


2.3 Montgomery versus Edwards Curves 

The main motivation to use Edwards (over Montgomery) curves is performance. 
There is one implementation of ECM using Edwards curves available: EECM- 
MPFQ. This implementation includes the “a = 1” Edwards curves approach 
from j3j and the “a = —1” Edwards curves approach from [I]. The a = — 1 
Edwards ECM approach is the fastest in practice and we use this as the base 
setting to compare to. Table 0 compares the required number of multiplications 
and squarings required in GMP-ECM and EECM-MPFQ for different typical 
B\ values used in ECM when used as a cofactorization method in NFS. These 
numbers show that using Edwards curves results in fewer modular multiplica- 
tions and squarings. However, the required storage for GMP-ECM (Montgomery 
curves) is independent of B\ while it grows almost linearly with the size of B i 
and is significantly higher, due to the use of windowing based methods, for 
EECM-MPFQ (Edwards curves, see 0 Table 4.1]). 

3 Elliptic Curve Constant Scalar Multiplication 

Most of the addition-subtraction chains based algorithms in practice use a w- 
bit windowing technique, for some (optimal) width w, to reduce the number of 
required elliptic curve additions. The total number of additions may be signifi- 
cantly reduced by using this approach but one also needs to store more points: 
2 W ~ 1 when using sliding windows [38] . In environments where the available mem- 
ory per thread is low, these methods cannot be used or one is forced to settle 
for a suboptimal window size. A prime example of such a platform are graphics 
processing units (GPUs); one of the latest GPU architectures m (Fermi) shares 
64 kilobyte fast shared memory per 32 processors and each processor typically 
time-shares multiple threads (e.g., 16 to 32 corresponding to 128 to 64 bytes per 
thread). 
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We investigate two approaches to lower the number of elliptic curve additions 
and the storage required to compute the scalar product. Our approach is inspired 
by the results reported by Dixon and Lenstra m,- Suppose we have a scalar 
k = lcm(l, . . . , B\) = JX i=1 Pi, where the p, are primes which can occur multiple 
times. Typically, the ECSM is implemented processing one such p, at a time BSI- 
In H2I it is suggested to process the Pi in batches', i.e. multiply a batch of /Vs at 
a time such that the weight of the product w'dl,: Pi), the number of ones in the 
binary representation of n iPi, is (much) lower than the sum of the individual 
weights J2i w (Pi)- If this is the case then the number of required EC-additions 
is reduced when using the straight forward double-and-add approach (which 
does not require to store any additional precomputed points). Such low- weight 
products can be constructed by greedily searching through 6-tuples of the Pi 
where b is small. In m b was at most 3 which reduced the total weight by 
approximately a factor three. As an example the following triple is given 

1028107 • 1030639 ■ 1097101 = 1162496086223388673 
w(1028107) = 10, w(1030639) = 16, w(1097101) = 11, 

w(1162496086223388673) = 8, 

where the product of primes of weights 10, 16, and 11 results in a integer of weight 
eight. The resulting composite integer can be computed using an addition chain 
requiring only seven additions and 60 doublings using the naive double-and-add 
algorithm. 

In this section we explore different methods to find numbers which can be con- 
structed using even better (higher) doubling/ addition ratios. These methods do 
not aim to construct sequences by combining the different Pi (as in ca) but we 
propose an opposite approach by factoring many integers which are the result of 
addition-subtraction chains with high doubling/addition ratios and subsequently 
combining these integers such that all pi s are used. These addition-subtraction 
chains are constructed such that they do not require any large lookup tables. 
Notice that the information encoding the sequence of arithmetic operations has 
to be stored (in all approaches). This does not pose a problem since this in- 
formation is constant and can be shared among all the computational units (or 
streamed to the units or even hardcoded) and hence does not result in additional 
overhead in practice. 

In the remainder of the paper we denote addition-subtraction chains simply 
as chains. 

3.1 Chains with Restrictions 

In order to generate integers which can be computed using a chain with a high 
doubling/addition ratio we need to construct and denote chains of a certain 
length to. A chain is a sequence of doublings, additions and subtractions denoted 
by D, A and S respectively. A doubling can always be assumed to apply to the 
previously generated element in the chain (instead of doubling any previous 
element), since one can reorder the symbols such that doubling always occurs 
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on the last element without changing the result of the chain. In some cases this 
might result in a shorter (more efficient) sequence when the same element is 
doubled multiple times. Let us define the set of symbols 0 as 

0 = {D} U {A id | i t j 6 Z,i > j} U {S hJ | i,j 6Z,i> j}, 

where the subscripts indicate on which element in the chain we compute (this is 
made more precise later). The set of all m-tuples, ordered lists of m elements, of 
symbols in 0 with the restriction that no elements can be used which have not 
yet been generated is 

O m = {(o m _i, . . . , oo) 6 0 m | o k e { D}U{A id | i < k}U{Sij \i<k},0<k<m}. 

In order to construct a chain from such an m-tuple of symbols we define functions 
a m : 0 x Z m+1 — > Z m+2 such that (o, ( t m , . . . , to)) i-> (t m +i,f m , . . . , to) where 

f 2 t m if o = D, 
t m + 1 = S ti 4- % if o = Ai j, 

( u - tj if 0 = Sij. 

Given an m-tuple of symbols (o m _ i, . . . , oq) £ O rn the (m + l)-tuple of integers 
associated to this chain is cr m _i(o m _i, cr m _ 2 (o m _ 2 , • • • , oo(oo, 1) ■ • •)) an d the 
resulting integer produced by this chain is t m . As an example consider the 7-tuple 
of symbols (<§ 6 , 0 ) D , D, A-^^, D, D, D ) G Oj which corresponds to the 8-tuple of 
integers in the chain (35, 36, 18, 9, 8, 4, 2, 1) computed as 

^" 6 (*§ 6 , 0 ) < 75 (A 04 (A 0 " 3 (^ 3 ,O) cr%{D, (Ti(D, a 0 (D, 1))))))). 

The function a m is the correspondence between a tuple of symbols and the actual 
chain. The example shows how to compute the resulting integer 35 using one 
subtraction, one addition and five doublings. 

The set of tuples O m consists of the most generic type of chains, a signif- 
icant amount of tuples corresponds to chains which perform useless (unneces- 
sary) computations. An example is computing the addition (or subtraction) of 
two previous values without using this result. To address this we define a more 
restricted set of tuples V m C O rn as 

V m = {(o m _i, • • • , o 0 ) e O m | o k e {D}U{A id | i = fc}U{<§ij \i = k},0<k<m}. 

These additional restrictions ensure that, just as for the doubling, we only add or 
subtract to the last integer in the sequence to obtain the next one. Such chains 
are known as Brauer chains or star addition chains [ I iSl Section C6]. 

In this setting we write Aj and Sj for A d j and Si d , respectively, and k > 0 
subsequent instances of D are denoted by D k . The previous example can now 
be written as SqD 2 AqD 3 G V 7 by abusing the notation: omitting the brackets 
and comma’s. In practice we would generate sequences of symbols such that 
a number of elliptic curve additions A and doublings D are fixed and look at 
sequences of symbols of length m = A + D which use A times Aj or Sj and 
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D times D. Different tuples might compute the same integer result. Using our 
example, the number 35 can be obtained with D = 5 and A = 2 in different 
ways 

35 = (2 3 + 1) • 2 2 - 1 S 0 D 2 A 0 D 3 e V 7 
= (2 4 + 1) • 2 + 1 A 0 DA 0 D 4 e V 7 . 

3.2 Generating Chains 

We discuss how to efficiently generate the resulting integers t m in a low-storage 
and no-storage setting. 

The Low-Storage Setting. Let A be the number of elliptic curve additions 
and D the number of elliptic curve doublings (with D > A). The generation 
of all the tuples in V m , with m = A + D results in many identical integers 
t m . Removing these duplicate integers can be achieved by first generating and 
storing all the resulting integers and subsequently sorting and keeping exactly 
one of consecutive equal integers. To avoid storing all the resulting integers for a 
given pair (A,D), which requires a significant amount of storage as we will see 
later, and to avoid sorting this huge data set we define a more restricted set of 
rules Qrn C V m C O rn as follows 

Qm = {(om-ii . . . , oo) e V m \oq = D, o m - 1 £ {Ai, S^and for 0 < k < m — 1: 

o k e {D} U {Ai, Si}, o k € {Ai, Si} => o k - 1 = D 
A(i = 0V o i - 1 G{A e ,S e })}. 

The restrictions used in the definition of Q m ensure that the resulting integer is 
odd and only addition (or subtraction) of an odd number to the current (even) 
number is allowed. This approach significantly reduces the amount of chains 
which produce the same resulting integer at the cost of slightly reducing the 
number of unique integers produced. To illustrate, for D = 50 the total number 
of tuples generated by V53 is more than 140 times higher compared to <2 5 3 while 
the number of unique odd resulting integers is only 1.09 times higher. 

The list of rn + 1 integers Ui corresponding to the m-tuple of symbols from 
Q m can be efficiently generated recursively using 
_ f2ui 

U%+1 ~~ \ Ui ± Uj for j < i and 2 | Ui, 2 { Uj 

with uo = 1 and ensuring that the final operation is not a doubling (to make 
the resulting integer odd). Hence, the next integer in the sequence can always 
be obtained by doubling or adding a previous odd number Uj to the current 
even integer tq. The required storage depends on which u :] are used in subse- 
quent additions and at which indices they are used. In practice we generate all 
sequences using a fixed number of doublings D and additions A making sure 
that the resulting storage requirement is never too large. 

A sequence of additions and doublings corresponding to the chains resulting 
from Q m looks like 

A iA _ x D dA ~K.. A^D^ A io D do = ( A iA _, D) D dA ~ 1 " 1 . . . ( A il D)D dl ~ 1 (A io D)D d ° ~ 1 

(1) 
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with D = X^o 1 d%, d-i > 0, and indices ij that satisfy the restrictions of Q rn , 
i.e., ij takes one of the values Yl g = o(^s + 1) f° r — 1 < h < j. Such a sequence 
starts with a doubling, ends with an addition and an addition is always preceded 
by a doubling. Hence, there are (°Ii) choices for the order of the A — 1 pairs 
fyfy.D) and the D — A doublings D. Since every addition can be substituted 
by a subtraction the number of possibilities is multiplied by a factor 2 A . The 
indices ij can be chosen in A! ways, hence the total number of resulting integers 
produced by Q m is 

(^ij) ■ A!-2 a = 2 a -A. n(D-A-H). 

The No-Storage Setting. The second setting we consider is constructing 
chains which do not require any additional stored points, besides the in- and 
output (and possibly some auxiliary variables required to calculate the elliptic 
curve group operatiou). This means we are looking for integers which can be 
computed using chains which only use doublings and add or subtract the input 
point. We can define the set of tuples 1Z m C Q rn as lZ rn = {(o m _i, . . . , oo) £ 
Qm | Ok £ {A 0 , ,S'o, -D}, 0 < k < m}. All resulting integers of no-storage chains 
which can be constructed using A elliptic curve additions and D elliptic curve 
doublings are of the form 

A— 1 

2° + ^ ±2”% with 0 = no < ni < . . . < rii < . . . < nx-i < D. 

i=0 

This follows from JU by setting ij = 0; we have rq = Y^ g =i dx-g- Using the 
same argument as in the low-storage setting the number of resultiug integers 
generated by lZ m is (°Ii) ' 2 A . Compared to the low-storage setting the number 
is reduced by a factor of A!, reflecting the missing choice of the indices ij. 

4 Combining Chains 

Recall that, given a bound we want to perform an elliptic curve scalar 
multiplication with the integer k = J i Pi = lcm(l, . . . , B\) where the product 
ranges over l (not necessarily distinct) primes. We can get rid of the problems 
posed by the primes 2 in this product by noticing that they can be handled by a 
sequence of doublings at the end of the ECSM and assuming in the following that 
all Sj are odd. The techniques from the previous section provide us with a lot of 
integers which can be constructed using a known number of additions (here we 
count subtractions as additions) and doublings. Since different chains can lead 
to the same integer we pick for each of these integers one chain (preferably the 
one with the lowest cost). In this way we get a list of distinct integers, each with 
an associated chain. We index this list by an index set I and call s t the integer 
corresponding to i £ I. For i £ I denote by add(.s,) resp. dbl(s,;) the number 
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of additions resp. doublings in the chain and by . . . , the multiset of 
the primes in the prime decomposition of Sj. Furthermore, let cost (s^) be the 
cost of performing a scalar multiplication with s* using the associated chain. A 
reasonable choice for Edwards curves is cost(sj) = 7dbl(sj) + 8add(sj) + 1 which 
will be discussed in the next section. 

Ideally, we want to find a subset I' C I such that k | riie/' and XXe/' cos t( s ») 
is minimal. To facilitate our task we will modify this in two ways. If the product 
in the first condition is bigger than k we do more work than necessary. This 
can lead to a lower cost, but we assume that replacing the first condition by 
k = riie/' s i will not increase the minimum of J2iei' cos t( s i) significantly. The 
second modification is the replacement of cos t( s i) by XXe/' add(sj). To 

explain why we think that this does not increase the minimum too much we 
consider subsets I' for which J2iei' cos t( s i) is close to the minimum. Then most 
Si have a high ratio anc j therefore we have for most of them s, « 2 dbl ( s< X 

Since X\ ieI , Sj = k the sum dbl(sj) « log 2 (fc) does not vary too much. 

Furthermore, the summand 1 in the cost function is the least significant term 
and the cardinality of I' does not vary much. We are aware that the second 
modification is more delicate than the first one, but, as explained below, we will 
generate many sets I' and will pick the best one amongst them using the more 
costly function cost(sj). 

The condition k = Y\,ci' implies that every s t in this product is B t - 
powersmooth which suggests the following two stage approach: 

1. Restrict to / = {i £ 1 1 Sj is B\ -powersmooth} . 

2. Find a subset I' C I such that the multisets Uie/'{ s i,i> • • • > s Mi} = 
{pi, . . . ,pg} coincide and that XXe/' a< id(s,;) is minimal. 

Testing a large list of numbers for B[ -powersmoothness can be done using the 
method from Section 4]. The main idea is to build a product tree from the 
list, replace the root node R (the product of all numbers of the list) by k mod R 
(where k = lcm(l, . . . , B \ ) is precomputed) and then tree- wise replace each node 
by the residue of k modulo the node. The leaves resulting in 0 contained Bi- 
powersmooth numbers and their factorizations can be obtained by other means. 

Finding an optimal set I' is in general a difficult problem and has been studied 
in m\- We choose to use a greedy approach which produces satisfactory results. 
We start with an empty set I' and the multiset M = {pi . . . . , p(} of primes to 
be matched. As long as M is non-empty we select an integer s,, = n/=i s i,j with 
{si.i, . . . Si, tj} C M such that the ratio fdd^) high and replace I' by I' U {i} 
and M by M \ {sj,i, . . . Sj,^}. This may fail because we might not be able to 
satisfy the condition {sj,i, . . . s*,^} cMata given point. There are several ways 
to overcome this problem, e.g., we could increase our supply of s* by generating 
more chains. Another way consists in aborting the greedy search at this point, 
getting k = c-\\ ieI , s t for some integer c. Using the method of Dixon/Lenstra, we 
can search for a decomposition of c into several factors, each having a good chain. 
For the sizes of B\ considered in this paper, namely B\ < 8192, c consisted of 
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very few primes and was often 1. Therefore the usually lower doubling/addition 
ratio of the c-part does not pose a problem for small B\. 

A refinement to this approach is to also take the size of the prime factors 
Sij into account. A strategy could be to prefer choosing integers s t which have 
mostly large prime divisors, since the majority of the primes Pi is large. The idea 
is to attach a score to a B \ -powersmooth integer given its prime factorization 
with respect to the currently unmatched prime factors in k. For a multiset N of 
primes bounded by B\ the ratio of j-bit primes is defined as 

_/*n. #{p e N I flog 2 (p)l = j} 

a ^> - #M 

where 1 < j < [log 2 (-Bi)] . Given M, the multiset of currently unmatched primes, 
the score of Si is defined as 


score 



[log 2 (Si)l 

E 


ah({sj, I,...,. 
a h (M) 




«}) 


The higher the score the more small prime divisors are likely to be present. In 
general, for a given ratio, we select the integers which have a low score. 

To illustrate, consider B\ = 1024 where the initial a, are 

a 2 = 0.032, a 3 = 0.037, a 4 = 0.021, a 5 = 0.053, a 6 = 0.037, 
a 7 = 0.069, a 8 = 0.122, a 9 = 0.229, a w = 0.399 

(with = !)• Almost 40 percent of all the primes fall in the largest (10- 

bit) category. An example of a low score-integer is 11529215054666795009 = 743- 
719 • 677 • 461 • 457 • 449 • 337 where the size of the smallest prime is 9-bit, the score 
is 3.57 and this integer can be computed using 63 doublings and five additions 
as A 0 D 11 A 0 D 12 A 0 D 10 A 0 D‘ 28 A 0 D 2 6 Ties- On the other hand, an example of a 
high-score integer, consisting of mainly small primes, is 1048575 = 4T3TlT5 2 -3, 
its score is significant higher (29.62) and it can be computed with 20 doublings 
and a single subtraction as SoD 20 e H 21 ■ 

This approach using scores is outlined in Algorithm^] Note that the scores are 
recalculated each time an s t is chosen. In practice one could reduce the amount 
of these costly recalculations by picking several Si in lines 10-13 of the algorithm; 
in this case one has to check that the union of the prime factors of the chosen 
Si is still a multisubset of M. 

A Randomized Variant. In the current state, Algorithm Q returns a single 
solution given a set of input parameters. To increase the amount of different 
subsets I' , and thereby hopefully improving the results, we randomize the se- 
lection process of the index that is added in lines 10-13 of the algorithm. With 
probability x € R (0 < x < 1) select the s t corresponding to scorei or, with 
probability 1 — x, skip it and repeat this procedure for score 2 and so on. If we 
have reached the end of the list (after j trials) one could apply a deterministic 
choice. 
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Algorithm 1. Given a bound B\ and a set of -Bi-powersmooth integers {s* | i £ 
/}, which can be computed with a chain using add(sj) resp. dbl(sj) elliptic curve 
additions resp. doublings, together with the prime factorization of these integers 
(si = n, s i,j) the algorithm attempts to output triples ( Sj , add(.Sj), dbl(.Sj)) such 
that lcm(l, . . . ,B\) = c \\ - Sj for a small integer c. This algorithm considers 

scores < T only and combines integers s, for which > r where r starts 

add(sj) 

at rh and is decreased until r;. 

{ Bound B\ £ Z, we have lcm(l, . . . , £>i) = J]^ p* with pi prime. 

Set of integers {s; | * 6 7} with Si = sy for Sij prime and i £ I. 

Upper and lower bound on the doubling/addition ratio: rn and ri . 

A threshold value for the score: T. 

Output: Triples (sj, add(s»), dbl(sj)) and c such that c • ]^[ s* = lcm(l, . . . , Bi). 

1. M <— {pi, . . . ,pt}, I' <— 0 

2. for r = rh to ri do 

3. found <— true 

4. while found— true do 

5. found <— false, j *— 0 

6. for i £ I do 

7. if {sip, . . . , Sj,^} C M and ^ r an< l scor e(sj,M) < T then 

8. j *— j + 1, score^ <— (scor e(si, M), i) 

9. sort score; for 1 < i < j with respect to score(.s,; : M) 

10. if j > 1 then 

11. i <— index from scorei, output (.s l; add(.s,;) : dbl(.s,)) 

12. 7' <— 7' U {*}, Mi- M\ {s t ,i, . . . , 

13. found <— true 

14. output {(s;,add(si),dbl(s;)) | i £ I'} and c = \[ peM p 


5 Additional Multiplications 

The fastest arithmetic for Edwards curves is due to Hisil et al. [S|. They pro- 
pose to use extended twisted Edwards coordinates, which are twisted Edwards 
coordinates plus an auxiliary coordinate. This allows faster addition but slower 
doubling. Using a mixing technique, by switching between extended twisted Ed- 
wards and regular twisted Edwards, the overall cost for scalar multiplication 
is reduced [03- This is realized by performing the doublings using the cheaper 
regular twisted Edwards coordinates when a doubling is followed by a doubling. 
When an addition is required after a doubling one can use the doubling for- 
mula in the extended twisted Edwards coordinates (which does not need the 
auxiliary coordinate as input) at the cost of an extra multiplication to compute 
the auxiliary coordinate of the result. Next, the fast addition is performed in 
extended twisted Edwards coordinates; one multiplication (to compute the aux- 
iliary coordinate of the output) can be saved, cancelling the extra multiplication 
used when doubling, since a doubling is always performed after an addition in 
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Table 2. The left table shows the number of integers (#int) generated with an addition- 
subtraction chain using A and D elliptic curve additions and doublings respectively. 
All these integers were tested for 2.9 • 10 9 -powersmoothness and, if smooth, the prime 
divisors are stored. The bold ranges indicate that 2 31 random integers per single A, 
D combination were tested for smoothness instead of the full range. The right table 
shows the number of unique Bi-powersmooth integers in the no-storage and low-storage 
setting for different values of B\ . 


ttmg 


Low-st( 


5 - 200 3.920 • 10 2 

10 - 200 7.946 • 10 4 
15 - 200 1.050 • 10 7 
20 - 200 1.035 • 10 9 
25 - 200 8.114 • 10 10 


1 5- 250 4.920 

2 10 - 250 2.487 

3 15 - 250 

4 20 - 250 

5 25 - 158 

5 159 - 220 1.331 

6 60 - 176 2.513 


2.956 


B i 
256 
512 
024 
192 
10 9 


No-Storage 
2.423 ■ 10 s 

1.470- 10 6 
5.691 • 10 6 
9.352 • 10 7 
2.274 • 10 10 


Low-Storage 
9.210 ■ 10° 

3.159 • 10 7 
7.861 • 10 7 
4.400 • 10 s 
3.997 • 10 10 


ECSM-algorithms. This approach assumes that both inputs of the elliptic curve 
addition are in extended twisted Edwards coordinates. This is the case for 
double-and-add algorithms and (signed) windowing algorithms where the com- 
putation of the auxiliary coordinates of the lookup table are a minor overhead. 

In both our settings, where we consider low- and no-storage, this does not hold. 
The computation of the large elliptic curve scalar product is done by processing 
batches of prime products (the Sj) at a time. All the additions or subtractions 
required in the chain to compute s,; require that the points are in extended 
twisted Edwards coordinates. When required, the odd intermediate results are 
stored in extended twisted Edwards coordinates at a cost of a single additional 
multiplication. The cost of computing a low-storage chain (o m _i, . . . , Oo) £ Q m 
resulting in s - L is increased by x(si) multiplications, where x(si) = #{ j | 3 h : Oh € 
{Aj,Sj}, 0 < h < m}; i.e. the unique number of indices used in the additions and 
subtractions. Therefore we get for the cost function from the previous section 
cost(sj) = 7dbl(si) + 8add(sj) + x(s l ). In the no-storage setting we always have 
x(si) = 1 leading to the choice for cost(sj) given at the beginning of the previous 
section. In total we have ^{chains used} additional multiplications in the no- 
storage setting and a potentially higher number in the low-storage setting. We 
can save one multiplication due to the sequence containing the power of 2 (which 
consists of doublings only) and another multiplication if we assume that the input 
point is already in extended twisted Edwards coordinates. 

6 Results 

Using the rules given in Section 1.4,21 for both the no-storage and the low-storage 
setting, we generated more than 10 12 integers for many choices of the number of 
additions A and doublings D. Table El summarizes the ranges we have covered 
where bold ranges (in the low-storage setting) indicate that only 2 31 random 
integers were generated instead of the full range. All these integers were subjected 
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Table 3. The table shows the number of modular multiplications (M) and squarings 
(S) required to calculate A elliptic curve additions and D doublings for various By 
parameters when factoring an integer n with ECM. The memory required is expressed 
as the number of residues ( R ), integers modulo n, which are kept in memory. The 
performance speedup PS (in terms of #M+#S) and memory reduction MR compared 
to the ECM approach from P] using “a = —1” twisted Edwards curves is given. 


By 

#M 

#s 

#M + #S 

PS 

A 

D 

#R 

MR 

256 0 

1638 

1436 

3074 


69 

359 

38 


No-storage 

1400 

1444 

2 844 

1.08 

38 

361 

10 

3.80 

Low-storage 

1383 

1448 

2 831 

1.09 

35 

362 

14 

2.71 

512 0 

3183 

2 952 

6135 


120 

738 

62 


No-storage 

2 842 

2 964 

5 806 

1.06 

75 

741 

10 

6.20 

Low-storage 

2 776 

2 964 

5 740 

1.07 

65 

741 

18 

3.44 

1 024 0 

6144 

5 892 

12 036 


215 

1473 

134 


No- storage 

5 596 

5 912 

11508 

1.05 

141 

1478 

10 

13.40 

Low-storage 

5 471 

5 904 

11375 

1.06 

123 

1476 

18 

7.44 

8192 0 

45 884 

47156 

93040 


1314 

11789 

550 


No-storage 

43 914 

47160 

91074 

1.02 

1043 

11790 

10 

55.00 

Low-storage 

42 855 

47136 

89991 

1.03 

878 

11784 

18 

30.56 


to 2.9 • 10 9 -powersmoothness tests which reduced the number of integers by 
about two orders of magnitude. This large powersmoothness-bound was chosen 
to facilitate searching for efficient chains for much larger B\ parameters. From 
the reduced set of integers we extracted those that are Bi-powersmooth for the 
values of Bi used in this paper (see right part in Table 0). These computations 
were done on five 8-core Intel Xeon E5430 (2.66GHz) and took more than a year, 
i.e., in total over 40 core years. The smoothness testing required most of the rim- 
time and up to 4.6GB of memory. Using the approach outlined in Algorithm 0 
one of these nodes was occasionally used for the combining experiments which 
consisted of thousands of runs of the randomized greedy approach, each of them 
taking only a couple of seconds for these low values of B\. 

Table 0 shows the results obtained using Algorithm Q on our dataset (see 
Table 0 ). The memory required is expressed in the number of residues (R), 
integers modulo n, which need to be kept in memory. Here we assume that 
extended twisted Edwards coordinates are used, i.e., every point is represented 
by four coordinates. In the setting of EECM-MPFQ jitlll we assume that an 
optimal window size is used and that besides the window table only the input 
point needs to be kept in memory while we assume that two points (the input 
point and the current active point) are required in the no- and low-storage 
setting. The implementation of the elliptic curve group operation is assumed to 
require at most two auxiliary variables (residues). Hence, the no-storage setting 
requires memory for 2 x 4 + 2 = 10 residues modulo n. The low-storage results 
presented in Table E3 require to store at most two additional points (8 more 
residues modulo n compared to the no-storage setting). This is still significantly 
less compared to the approach used in m- 
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6.1 Application to GPUs 

When running ECM on memory constrained devices, like GPUs, the large num- 
ber of precomputed points required for the windowing methods cannot be stored 
in fast memory. Typically one is forced to settle for a (much) smaller window 
size reducing the advantage from using twisted Edwards curves. For example, 
in |5| no large window sizes are used at all, the authors remark: “Besides the 
base point, we cannot cache any other points”. Memory is also a problem in 0, 
the faster curve arithmetic from Hisil et al. m is not used since this requires 
storing a fourth coordinate per point. 

From the data given in Table 0 it becomes clear that our approach reduces 
the memory requirements significantly. For example, the memory required to run 
ECM in the cofactorization setting on GPUs using B\ = 8 192 can be reduced 
by a factor 55. This setting was already considered in [5I4| where the authors 
were forced to reduce memory requirements by using suboptimal window sizes. 
Hence, when using the methods described in this paper less memory is required 
allowing the usage of the faster curve arithmetic and reducing the number of 
elliptic curve additions required in the computation of the elliptic curve scalar 
multiplication. 


6.2 Performance Comparison 

In order to measure the practical speedup of the methods described in this 
paper we implemented the no-storage approach on GPUs. This implementation 
uses the Compute Unified Device Architecture (CUDA) which facilitates the 
development of massively-parallel general purpose applications for GPUs ElU- 
Our implementation is targeted at the third generation CUDA GPUs called 
“Fermi” |29| . Table 0 compares the performance results of different hardware 
platforms for Bi = 960 and B\ = 8192, numbers chosen such that we can directly 
compare to results reported in the literature on other (hardware) platforms. For 
B\ = 960, which is used as the example Bi value in [4011 1) and not spending 
as much effort as for Bi = 1024, we were able to construct a no-storage chain 
requiring 1 371 doublings and 135 additions. The FPGA and GTX295 results are 
quadratically scaled to 192-bit arithmetic to compare the different performance 
results. The other GPU results are from 0 and this implementation is optimized 
for the second generation CUDA GPUs. The pricing for this card is omitted since 
it is no longer sold (this card was launched January 2009). The results on the 
Intel i7-2600K CPUs have been obtained with the ECM implementation (using 
Montgomery curves) from the NFS software suite [14| which is responsible for 
all recent record NFS factorizations (e.g. 03) and the EECM-MPFQ software 
package 0] which uses Edwards curves. The FPGA results are from pi 1 14( )] and 
the FPGA prices are taken from pTT11 . Note that the prices are for the GPU, CPU 
or FPGA devices only; in order to get a fully operational system more hardware 
is required. Note also that for all of the considered devices newer versions with 
better price performance ratio exist, but we do not expect that these will change 
this comparison significantly. 
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Table 4. Performance comparison of ECM on different platforms (using the “o = —1” 
twisted Edwards curves if available) . The first table lists the different hardware proper- 
ties. The second and third table state results for B\ = 960 and B\ = 8192 respectively. 
The scaled number of curves are when using 192-bit moduli. The performance ratio 
is the ratio between the GTX 580 no-storage row and the current row for the scaled 
number of curves per 100 USD. 


properties 

GPU 

CPU 

FPGA 


GTX 295 

GTX 580 

Intel i7-2600K 

V4SX35-10 V4SX25-10 

#cores 

480 

512 

4 

24 1 

clock (MHz) 

1242 

1544 

3400 

200 220 

price (USD) 


400 

300 

468 298 

#threads 

46 080 

8192 

4 

24 1 

#bits in moduli 

210 

192 

192 

202 135 


performance (#curves), B\ = 960 performance 



(1/sec) (1/se 

;c, scaled) 

(1/100 USD, scaled) 

ratio 

GTX 580, no-storage 

171 486 

171 486 

42 872 

1.00 

GTX 580, windowing 

79170 

79170 

19 793 

2.17 

Intel i7 P0J 

13 661 

13 661 

4 554 

9.41 

Intel i7 0 

8 677 

8 677 

2 892 

14.82 

V4SX35-10 |U 

3 240 

3 586 

766 

55.97 

V4SX25-10 UU 

16 000 

7910 

2 654 

16.15 


performance (^curves), B\ = 8192 


GTX 295 B| 

4928 

5 895 



GTX 580, no-storage 

19 869 

19 869 

4 967 

1.00 

GTX 580, windowing 

9106 

9106 

2 277 

2.18 

Intel i7 P0] 

1629 

1629 

543 

9.15 

Intel i7 0 

1092 

1092 

364 

13.65 


For the sake of comparison we also implemented Edwards ECM for GPUs 
using the same 192-bit arithmetic but using the windowing based approach. 
For Bi = 960 ( Bi = 8192) we used a signed sliding window of size 2 6 (2 8 ), 
precomputing and storing 2 5 (2 7 ) extended twisted Edwards coordinates. These 
results are stated in Table 0 as well. On the GTX 580 the no-storage approach 
is more than twice as fast as the approach based on windowing techniques. 
This is significantly better than the theoretical numbers from Table 01 When 
running exactly the same experiment on 96-bit (three 32-bit limbs instead of 
six 32-bit limbs) moduli the number of curves per second for the no-storage 
and windowing approach is 76 665 and 75 584 for Bi = 8 192 and 649 904 and 
618 111 for B\ = 960, respectively. We think that this behaviour can be partially 
explained by an increased memory usage for the windowing approach and a 
better handling of the no-storage approach by the compiler since this approach 
uses fewer variables. 

Another interesting observation is that the FPGA performance per 100 USD 
is lower than that of the CPU-based approaches. Furthermore, aided by the 
no-storage approach outlined in this paper, the GPU performance is almost an 
order of magnitude faster per 100 USD than the CPU and more than a order 
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of magnitude faster compared to the fastest FPGA results. This suggests that 
GPUs are the best platform, i.e. give the best performance / price ratio, for 
integer cofactorization. 


7 Conclusion 

The relatively new Edwards curves combined with the fast arithmetic from ex- 
tended twisted Edwards coordinates are faster compared to using Montgomery 
curves. This speed-up comes at a price, namely a larger memory requirement 
which, when optimizing for speed, grows roughly linearly in the size of B\, 
whereas the memory requirement in the Montgomery curves setting is constant 
and small. Inspired by the approach from Dixon and Lenstra and using the fact 
that only a few popular Bi-values are used in practice in NFS, we have presented 
techniques to reduce the memory requirement significantly by doing precompu- 
tations for these U] -values. In these precomputations we tested over 10 12 inte- 
gers coming from chains with a low addition/doubling ratio for smoothness and 
combined them using a greedy approach. Our results show that we require signif- 
icantly less memory compared to the current state-of-the-art Edwards ECM ap- 
proach, and are even slightly faster. This makes our approach extremely suitable 
for memory-constrained parallel architectures like GPUs. This is demonstrated 
by our GPU implementation which sets a new ECM cofactorization throughput 
speed record. 
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Abstract. In 2003 Michael Alekhnovich (FOCS 2003) introduced a 
novel variant of the learning parity with noise problem and showed that 
it implies IND-CPA secure public-key cryptography. In this paper we in- 
troduce the first public-key encryption-scheme based on this assumption 
which is IND-CCA secure in the standard model. Our main technical 
tool to achieve this is a novel all-but-one simulation technique based 
on the correlated products approach of Rosen and Segev (TCC 2009). 
Our IND-CCA1 secure scheme is asymptotically optimal with respect to 
ciphertext-expansion. To achieve IND-CCA2 security we use a technique 
of Dolev, Dwork and Naor (STOC 1991) based on one-time-signatures. 
For practical purposes, the efficiency of the IND-CCA2 scheme can be 
substantially improved by the use of additional assumptions to allow for 
more efficient signature schemes. Our results make Alekhnovich’s vari- 
ant of the learning parity with noise problem a promising candidate to 
achieve post quantum cryptography. 

Keywords: IND-CCA2 Security, Learning Parity with Noise, All-But- 
One Decryption. 


1 Introduction 

This paper presents the first IND-CCA2 secure cryptosystem based on a com- 
putational assumption first introduced by Michael Alekhnovich in the year 2003 
|A le()3j . This assumption essentially states that for a given random linear code C 
with a constant rate, a random code word with an inverse square root fraction of 
noise is indistinguishable from a random string. Alekhnovich |Ale03| was able to 
construct a semantically secure cryptosystem which was based solely on this as- 
sumption. It can be seen as an special case of the decisional learning parity with 
noise (LPN) problem. The decisional LPN problem (henceforth LPN problem), 
asks to distinguish noisy binary linear equations Ax + e from uniformly random. 
The problem is parametrized by the number of samples provided (i.e the number 
of rows of A) and the amount of noise (i.e. the distribution of e). While most 
cryptographic constructions based on LPN (e.g. jHB()lllTW7751lKSSl()j I use the 
standard parameter-choice of a polynomial number of samples and a constant 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 485- ED51 2012. 

(c) International Association for Cryptologic Research 2012 


486 N. Dottling, J. Miiller-Quade, and A.C.A. Nascimento 


fraction of noise, Alekhnovich’s LPN problem uses a linear number of samples 
and an inverse square root fraction of noise. These two parameter-choices are 
apparently incomparable. On one side, providing a larger amount of samples 
makes the problem apparently easier. On the other side, a larger amount of 
noise seems to make the problem harder. Nevertheless, Alekhnovich’s parame- 
ter choice seems to yield the stronger assumption, as constructing a public key 
cryptosystem from LPN with a constant fraction of noise remains an important 
open problem. 

LPN assumptions are of a more combinatorial nature and seem incompa- 
rable to the algebraic assumptions needed for the McEliece cryptosystem. For 
the security of the McEliece cryptosystem one has to additionally assume that 
scrambled Goppa-codcs arc computationally indistinguishable from random lin- 
ear codes |McK78l IHS081 IMKMOSt |PMQN0D| . Moreover, though there is a syn- 
tactic similarity to the learning with errors (LWE) problem, LPN and LWE also 
seem rather incomparable. LWE asks to distinguish a polynomial number noisy 
linear equations over Z q (for a polynomial sized q), where the error-distribution 
is euclidean, from uniformly random. IND-CCA2 encryption schemes based on 
LWE [PW081 IPeitM IM PI 2| use properties that are very specific to LWE (e.g. 
short dual-lattice bases) and not available in the binary domain. It has been open 
for nine years if an IND-CCA2 secure scheme could be built from Alekhnovich’s 
LPN problem. In this paper we present such a IND-CCA2 secure scheme which 
is based on the all-but-one approach jPDNOOL IPW081 IRS09| . The new construc- 
tion is asymptotically optimal for IND-CCA1 security. It has only a constant 
factor ciphertext-expansion and the ciphertexts are of size 0(k 2 ^ 1 ~ 2e ^), where 
k is the security parameter and e a small constant. To achieve IND-CCA2 se- 
curity we use a generic transformation based on one-time-signatures |l )l )N00| . 
A more efficient construction is possible using additional assumptions yielding 
more efficient signature schemes. The trapdoor of our scheme is substantially 
different from Alekhnovich’s original construction, but bears some similarities 
with the above-mentioned lattice-based constructions. It allows witness recov- 
ery and decryption with incomplete keys, which is necessary for applying the 
all-but-one approach. Different from jPW08L IRS09I IPei()91 |DMQJN05| we do not 
achieve the all-but-one property by repeatedly encrypting the same ciphertext 
or a correlated product. We employ a bitwise decryption and use error correction 
to cope with incomplete decryptions. The novel all-but-one simulation technique 
employed in this construction allows for a significant improvement in efficiency 
compared with previous constructions. While this new technique might be of 
interest in lattice-based cryptography, we see no obvious way to make use of 
our technique in McEliece-based constructions. Crucial to our technique is the 
ability to recover individual bits of a plaintext from the ciphertext using a par- 
tial secret key. This, however, seems out of reach for constructions based on the 
McEliece assumption. 

Related Work. Ciphertext indistinguishability under chosen ciphertext attacks 
(IND-CCA2) security fTTSTm is one of the strongest known notions of security for 
public key encryption schemes (PKE). Many computational assumptions have 
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been used in the literature for obtaining cryptosystems meeting this security no- 
tion. Given one-way trapdoor permutations, CCA2 security can be obtained from 
any semantically secure public key cryptosystem |N Yi)()L ISah991 lhin()3j . Efficient 
constructions are also known based on number-theoretic assumptions |( 'S98I 
CS03, HKQ2j, lattice-based assumptions [PW08, Pci09, MP12 , the McElieee 
assumption |BM(jJNU5| or identity based encryption schemes jCHK04j . 

2 Preliminaries 

2.1 Coding-Theory 

We need a few coding-theoretic facts and constructions for our schemes and 
proofs. We denote the finite field with q elements by F 9 . The hamming- weight 
|x| of a vector x £ F” is the number of its non-zero locations. The g-ary entropy 
function is defined as H q (a) = a log q (q - 1) - alog g a - (1 - a)log g (l - a). 
It assumes its maximum at a = 1 — 1 /q with H q {l — 1 / q) = 1. The vol- 
ume Volg (cm, n) of the hamming-ball of radius an in F™ can be bounded by 
qH q (ot)-n—o(n) < Vol q (an,n) < q H v(° l '> n . 

Random Codes and the Gilbert- Varshamov bound. The Gilbert- Varshamov bound 
guarantees the existence of g-ary codes with almost maximal relative minimum- 
distance 1 — 1/g. Moreover, with high probability, randomly chosen codes enjoy 
this property. Let n, d, k £ N and A > 0. If it holds that k <n — log q Vol q (d, n) — 
An, then the code C(G) generated by a uniformly chosen matrix G e F" xfc 
has minimum-distance at least d, except with probability q~ Xn . Therefore, if 
6 < 1 — 1/g it holds that n — log g Vol g (i5n, n) > (1 — H q (S))n =: (n. Thus, if 
k < C.n/2, a uniformly random chosen matrix G G F” xfc generates a code C(G) 
with minimum-distance at least Sn, except with probability g - ^"/ 2 . 

Asymptotically good codes with efficient error- correction. The decryption algo- 
rithm of our scheme will introduce errors in the plaintext when decrypting. We 
will therefore use asymptotically good error-correcting codes C with efficient 
error-correction algorithm Decodec to encode plaintexts. Prominent examples of 
such codes are binary expander-codes |SS9fil IZemOlj : There exists an explicit 
family of binary linear codes {C n } of constant rate R arbitrarily close to 1 that 
can efficiently correct an o- fraction of errors, for a constant a > 0. 

2.2 Bernoulli Distributions and Bounds 

In this section we will briefly gather some facts about low-noise Bernoulli distri- 
butions. While Alekhnovich’s |Ale03j original proposal used a noise distribution 
that samples vectors of low- weight t uniformly at random, we will use Bernoulli- 
distributions where each bit of a vector is 1 with probability t/n and other- 
wise 0. The advantage of Bernoulli-distributions over the former distribution is 
that all components are independent of one another. We will take advantage of 
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this fact when bounding the hamming-weight of matrix-vector products when 
the matrix is chosen from a Bernoulli distribution. The decryption-algorithm of 
Alekhnovich’s and our encryption-scheme computes inner-products of Bernoulli- 
distributed vectors. To ensure that the inner-product of two Bernoulli-distributed 
vectors is 0 with high probability, we need to choose the bit-flip probability p 
below a 1 /y/n amount. If p is too big (e.g. constant), then the distribution 
of the inner-product would be statistically close to uniform and our decryption- 
approach would fail. Finally, we show that matrices X chosen from a component- 
wise low-noise Bernoulli distribution enjoy (with high probability) the property, 
that a product Xs has low-hamming- weight, for any vector s with sufficiently 
small hamming-weight. We will call such matrices good, and we will use this 
property for proving correctness of our schemes and in the proof of IND-CCA1 
security. 

Bernoulli distributions. For a noise-parameter p, we write \ p for the Bernoulli- 
distribution that outputs 1 with probability p and 0 with probability 1 — p. The 
distribution of the hamming-weight of a vector of n iid distributed Bernoulli- 
distributed random variables is the binomial distribution B pn . Throughout the 
paper, we frequently need to bound Binomial distributions. For this we require 
two different Chernoff bounds. Let x be distributed by y". 

1. It holds for any R > Qpn that Pr[|a,j > R] < 2~ R . 

2. It holds for any 0 < 5 < 1 that Pr[||x| — pn\ > 5pn\ < 2e~ s2pn / 3 . 

Distributions of inner products. For the decryption-algorithms of our schemes we 
require that the inner-product of a Bernoulli-distributed vector x and a vector s 
of small hamming weight is 0 with probability bounded away from 1/2. We will 
thus show that the probability of the inner-product being 1 is sub-constant for 
a proper choice of p. Let s e be a fixed vector and x be distributed by y". 
By a simple XOR-Lemma, it holds that 

Pr[x T S =il=i-(l-(l-2p)H), 

i.e. the random variable x T s is distributed according to Xp' with p' = \ ■ (1 — 
(1 — 2p)l s l). If it holds that p = p(n) = 0(n _1//2_e ) for some constant e > 0 and 
|s| < 7 pn for some constant 7 > 0, we get the following estimate for p' . By the 
mean- value-theorem it holds for any p in the interval (0, e -1 ) that e~ ep < 1 p, 
therefore we get 

p' = i.(l-(l-2p)H) < i.(l-e- 2 ^H) < ^(1-^") = \-(\-e~°^). 

The last term is sub-constant in n, i.e. p'(n) = o(l). This means that for suffi- 
ciently large n p' is arbitrarily small. 
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Multiplication with random matrices. We will now give bounds for how much the 
hamming weight of a vector s increases when multiplied with a matrix X £ F^*" 
chosen from x l p xn - Let x be distributed by x'p an d the hamming- weight of s be 
bounded by |s| < 7 pn. Then by the above p' = Pr[:r T .s = 1] can be made an 
arbitrarily small constant if p = 0(n _1 / 2 ~ e ). If X £ F^*" is distributed by Xp X ") 
then |Xs| is distributed by the Binomial-distribution B f / j. The Chernoff-bound 
thus yields that for any R > tip' l it holds that Pr[|V,s| > '/?] < 2~ R . The volume 
Vol2(7 pn,n) of the hamming-ball of radius -ypn in F£ is bounded by 2 H2 ^ p ^ n . 
Thus, there are at most 2 H ^ lp l n vectors s satisfying |s| < 7 pn. A union-bound 
yields for any R > 6p'l 

Pr[3s e Fg : |*| < 7 pn and \Xs\ > R] < 2 H2 ^ n ■ 2~ R . 

If l = Q{n) and $ > 0 it holds that 

Pr[3s e Fg : |*| < 7 pn and \Xs\ > /3l\ < 2~ n ( n \ 
as &%(yp)n is sub-linear in n (i.e. o(n)) since p = 0(n _1 / 2_e ). 

Definition 1. Fix a constant ft and e = e(n). We shall call a matrix X £ F| xn 
(f 3,e)-good , if for all seFJ with |s| < en it holds that |Xs| < pi. 

The above now implies that for p = 0(n L 2 ~ e ), any fixed (3 , 7 > 0 and suffi- 
ciently large n, a matrix X sampled from xi x ” is (/?, 7p)-good with overwhelming 
probability in n. 

2.3 Public Key Encryption 

This Section is only meant to provide reference for the standard notions of 
security for encryption schemes and can be safely skipped. Let A; be a security 
parameter. 

Definition 2. A public key encryption scheme PKE is a tuple (KeyGen, Enc, Dec), 
such that 

— KeyGen (l fc ) is a PPT-algorithm that takes a security-parameter k and out- 
puts a pair of public and private keys ( pk , sk ) . 

— EnCpfc(m) is a PPT-algorithm that takes a public key pk, a message m and 
outputs a ciphertext c. 

— Dec s fc(c) is an efficient deterministic algorithm taking as input a secret key 
sk and a ciphertext c and outputs a plaintext m. 

A standard-requirement for public key encryption is correctness. 

Definition 3. We say that PKE = (KeyGen, Enc, Dec) is correct, if it holds for 
all plaintexts m that 

Pr[Dec s fe(Enc p fc(m)) ^ m : ( pk,sk ) = KeyGen(l fc )] < negl(fc). 

The three security notions for public key encryption we are concerned with in 
this paper are IND-CPA, IND-CCA1 and IND-CCA2 security. Let A be an 
adversary. 
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Experiment: IND-CPA. 

— Generate a pair of keys (pk, sk) = KeyGen(l fc ). Run A on input pk. 

— Once A outputs a pair (mo, mi), flip a coin b and compute c* = Enc p k(mb). 
Give input c* to A and continue its computation. 

— Let b' be M’s output. Output 1 if 1/ = b an 0 otherwise. 

Experiment: IND-CCA1. 

— Generate a pair of keys (pk, sk) = KeyGen(l fc ). Give A access to a decryption- 
oracle Dec s fc(-) and run A on input pk. 

— Once A outputs a pair (mo, mi), flip a coin b and compute c* = Enc p fe(mfc). 
Give input c* to A and continue its computation without access to the 
decryption-oracle. 

— Let b' be M’s output. Output 1 if b' = b an 0 otherwise. 

Experiment: IND-CCA2. 

— Generate a pair of keys (pk, sk) = KeyGen(l fc ). Give A access to a decryption- 
oracle Dec s fc(-) and run A on input pk. 

— Once A outputs a pair (mo, mi), flip a coin b and compute c* = Enc p k(m,b). 
Give input c* to A and continue its computation with access to the 
decryption-oracle. 

— Let b' be M’s output. Output 1 if b' = b an 0 otherwise. 

Definition 4. For X e {CPA, CCA1, CCA2}, we say that the scheme PKE = 
(KeyGen, Enc, Dec) is IND-X secure, if it holds for every PPT-adversary A that 
Ad v, nd .x(A) = | Pv[IND-X(A) = 1] - 1/2| < negl(fc). 

2.4 One-Time Signatures 

We also briefly recall the definition of one-time signatures fl ;a,m79j . Let He a 
security parameter. 

Definition 5. A one-time signature scheme SIG is a tuple (Gen, Sign, Verify), 
such that 

— Gen(l fc ) is a PPT-algorithm that takes a security-parameter k and outputs a 
pair of verification and signature keys (vk,sgk). 

— Sign sfffc (m) is a PPT-algorithm that takes a signature key sgk, a message m 
and outputs a signature a. 

— Verify vk (m,a) is a PPT-algorithm taking as input a verification key vk, a 
message m and a signature c and outputs a bit b € {0, 1}. 

We require one-time signature schemes to be correct. 

Definition 6. We say that SIG = (Gen, Sign, Verify) is correct, if it holds for all 
messages m that 

Pr[Verify^ fc (m, Sign s g k(m)) = 1 : (vk, sgk) = Gen(l fc )] > 1 — negl(fc). 
Moreover, we require existential unforgeability under one-time chosen message 
attacks (EUF-CMA security), specified by the following experiment. Let A be 
an adversary. 
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Experiment: EUF-CMA 

— Generate a pair of keys ( vk , sgk ) = Gen(l fc ). Give A a access to a signing- 
oracle Sign gi?fc (-) that signs one message m* of M’s choice and then outputs 
T for any further signing-queries. Rim A on input vk 

— Once A outputs a pair (m, cr) with rn 7^ m* , compute b = Verify,, fe (m, a) and 
output b. Otherwise output 0. 

Definition 7. We say that SIG = (Gen, Sign, Verify) is EUF-CMA secure, if it 
holds for every PPT-adversary A that Pr[EUF-CMA(A) = 1] < negl(£;). 

EUF-CMA secure one-time signature schemes can be constructed from any one- 
way function jl 

3 The Hardness-Assumption 

The basic problem we will base the security of our scheme upon is a variant of 
the decisional learning parity with noise (LPN) problem. Roughly speaking, the 
LPN problem asks to distinguish a number of noisy samples of a linear function 
(specified by a secret vector x) from uniform random. The variant considered here 
differs from the standard LPN problem in two aspects. First, the distinguisher 
is provided only linear number of samples, rather than an arbitrary polynomial 
number. Second, the noise-level in this variant is significantly lower than in the 
standard LPN problem. While the standard LPN problem comes with an error- 
distribution that flips each output-bit with a small, but constant probability, for 
this variant the probability is sub-constant. More precisely, we will work with a 
bit-flip probability of the order 0(n -1 / 2-e ) for some small constant e. Here, n 
is the size of the secret x in bits. 

Problem 1. Let n G N be a problem parameter, m = 0(n ) and e > 0 and 
p = p(n ) = 0(n _1 / 2_e ). Let A e be chosen uniformly at random, x eF% 

be chosen uniformly at random and e according to x™. The problem is, given A 
and y, to decide whether y is distributed according to Ax+e or chosen uniformly 
at random. 

Currently, the best classical algorithms to attack Problem |T| require time of 
the order 2' Q (” 1/2_e ) |Ste88l ITYTM1 IM M'l'lll IPI ,P1 ll IK.I M M 1 Moreover, there 
are no quantum algorithms known performing significantly better than the best 
classical algorithms. In our constructions we will choose n by n = 0{k 2 ^ 1 ~ 2<i ' > ) , 
where k is the security parameter. This normalizes the hardness of Problem [I] to 
2©(fc) Thus, we choose p by p(k) = 0(k~^ 1+2e ^^ 1 ~ 2e ^). In the full version of this 
paper, we provide a reduction establishing the hardness of problemQbased on the 
hardness-assumption used in |Alef)3j . which uses a different error-distribution. 
It will be necessary to use a normal-form (as in [ACPSOHj ) of Problem |T| in our 
cryptographic constructions, which is stated in Problem |2| In this normal-form, 
the secret x is drawn from the noise-distribution 
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Problem 2. Let n € N be a problem parameter, m = 0(n), e > 0 and p = 
0(n -1 / 2-e ). Let A £ F ™' x ” be chosen uniformly at random, x be distributed 
according to y” and e be distributed according to y™. The problem is, given A 
and y, to decide whether y is distributed according to Ax+e or chosen uniformly 
at random. 

The hardness of Problem |21 can be established by a simple reduction from Prob- 
lem 0 given in the full version of this paper. By a simple hybrid- argument, it 
follows that that a matrix- version of problem 0 is also hard. 

Problem 3. Let n £ N be a problem parameter, m,k = 0(n), e > 0 and 
p = 0(n -1 / 2-e ). Let A £ Fg xfc be chosen uniformly at random, T £ F™ x ” 
be distributed according to y™ x " and X be distributed according to y™ xfc . The 
problem is, given A and B , to decide whether B is distributed according to 
TA + X or chosen uniformly at random in F™ xfc . 

In the security-proof for our schemes, we will use Problem 0 to establish pseu- 
dorandomness of the public keys, while we use Problem 0 to establish pseudo- 
randomness of the ciphertexts. 

4 Outline of the Techniques 

In this Section, we will outline the techniques used to construct an IND-CCA1 
secure scheme based on the hardness of Problem 0 and Problem 0 We will 
provide the full presentation in the subsequent sections. Let henceforth p = 
0(n -1 / 2-e ) for a small constant e > 0. 

We will start with a rough outline of a scheme that encrypts single bits and 
has a substantial decryption-error. On a technical level, this first building block 
resembles the schemes of Regev |Reg05| and the Dual-Regev Scheme of Gentry et 
al. |GPV08j (which both live in the LWE realm). Public keys for our scheme are 
pairs ( A , b T ), where A £ Fij x n is chosen uniformly at random and b T = t T A+x T 
with t £ Fj 1 is distributed by yj) 1 and x £ F 2 by y”. The secret key is t T . To 
encrypt a message m £ F 2 , sample s according to y", ei according to x\ and 
e .2 according to y p . Compute c = (As + ei, b T s + e 2 + m) and output c. To 
decrypt a ciphertext c = (ci,c 2 ), compute y = c 2 — t T c\ and output y. The 
output y is a noisy version of the plaintext m, since it holds that y = c 2 — t T c i = 
6 T s+e 2 +m— t T (As+ei) = m+t T As+x T s+e 2 —t T As—t T ei = m+a; T s-|-e 2 — t T ei. 
By the properties of the distribution y p , the error-term v = x T s + e 2 — t T e i is 
0 with probability bounded away from 1/2, i.e. it holds y = m with substantial 
probability. 

This decryption-error can be dealt with by encoding m (which is now a bit- 
vector of length n ) using an error-correcting code as follows. Let G £ F^” be 
the generator-matrix of a binary linear error-correcting code C. The modified 
scheme works as follows. Public keys are of the form (A, B ) with A as above 
and B = TA + X, where T is chosen from y l ^ xtl and X from y L x ". The se- 
cret key is T. Messages m £ FJ are encrypted as c = (As + ei, Bs + e 2 + Gm), 
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with s, e\, e ,2 sampled from the corresponding \p distributions. Decryption com- 
putes y = C 2 —T ci = Grn + Xs + — Te\. Since the matrices T and X were 

chosen from a \p distribution, they are good (as defined in Section 12.211 with 
overwhelming probability. Thus the error-term v = Xs + e .2 — Te\ has a low 
hamming- weight and we can use the decoding-procedure of C to recover to. The 
IND-CPA security of this scheme follows easily by the hardness of Problem El 
and Problem 0 However, we will require a witness-recovering IND-CPA scheme 
for the construction of our IND-CCA scheme. A scheme is witness recovering if 
the decryption recovers the randomness used to encrypt. For the above scheme 
however, the vector s is ’’lost” during decryption. We circumvent this problem 
by using some sort of key-encapsulation. Instead of encrypting a plaintext- vector 
to using the above scheme, we encrypt the witness s (which has the same size 
as to). We will then use another instance of Problem El to encrypt the plaintext 
to (using s as symmetric key). Encrypting the witness s instead of m will not 
harm security. By Problem El the matrix B is pseudorandom. Therefore, the 
matrix B + G is also pseudorandom. Thus, the second part of the ciphertext 
C 2 = Bs T e 2 + Gs = (B + G)s + e 2 is also pseudorandom by Problem El Ob- 
serve that we do not need the entire secret key T to recover s from a ciphertext 
c. Let y = C 2 — Tci = Gs + Xs + e 2 — Tei. To recover the i-th component 
yi of y. we merely need the i-th row tj of the matrix T. If we posses a suffi- 
cient amount of the rows of T, yet not all of them, we can still recover s by 
computing y. L for all the i for which tj is known and setting y t = T (erasure) 
otherwise. We can now recover s by performing a combined error- and erasure- 
correction on y using the decoding algorithm of C. If it is guaranteed that the 
number of erasures is very low, we can simply set all erasures to random val- 
ues (thereby introducing a few additional random errors) and use the standard 
decoding-algorithm Decodec of C. Micciancio and Peikert jMP12| recently used a 
very similar witness-recovering mechanism in their construction of an improved 
LWE-based IND-CCA2 scheme. While our construction uses off-the-shelf binary 
error-correcting codes to encode the witness s, they needed to construct a spe- 
cial family of lattices for this purpose. These lattices have a short dual basis and 
an efficient decoding algorithm, thus they can be seen as a euclidean analogue 
to efficiently decodable error-correcting codes with large minimum distance. We 
can now give an outline of our IND-CCA1 construction. It is an adoption of the 
all-but-one simulation-paradigm |PW081 IB.S09j to the special structure of our 
CPA scheme. The key-generation samples not just one, but q (for a constant 
q) matrices B\, . . . ,B q and T\,. . . , T q . Encryption first samples a tag r, then 
derives an instance-public-key B r from Bi, .... B q . It further proceeds as the 
INC-CPA variant using the matrix B T instead of B. The ciphertext is (r, c). 
Decryption takes the tag r, derives an instance secret-key T r and uses T t to de- 
crypt c. After recovering the random coins it checks whether they suffice a certain 
hamming-weight criterium. If not, it aborts, otherwise it outputs the plaintext 
to. The instance-key derivation will assemble the matrix B T by picking certain 
rows from the matrices B\,...,B q depending on the tag r. In the security proof, 
there will be a single tag r* for which the simulator is completely oblivious of the 
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instance-secret key T r » (this is the tag where the IND-CPA challenge will be 
embedded). For all other tags, the simulator needs to be able to simulate a 
decryption-oracle. This means that no other instance-secret-key T r should share 
too many rows with T t ». If this is the case, the simulator will be able to 
use an incomplete secret key to answer decryption-queries by the above ob- 
servation. To guarantee that the instance-secret-keys T t have small overlap 
with one another, we will use a q - ary error-correcting encoding for the tags r. 
This simulation-strategy requires that the hamming-weight of the ciphertext- 
noise satisfies a certain bound, otherwise the simulator is unable to correct 
the additional erasure caused by the incomplete secret key. This is the rea- 
son why the decryption needs to check the hamming-weight of the witnesses. 
The IND-CCA2 construction is obtained by replacing the randomly chosen tags 
r with the verification keys of a one-time signature scheme and appending an 
according signature to the ciphertext. This transformation has been used in 
several contexts to obtain CCA2 secure encryption from different primitives 
[DDNOO, CHK04 PW08, RS09, DMQN09 . The encryption primitives admit- 
ting such a transformation can be generalized under the notion of tag-based 
encryption schemes |KII06| . 

5 The IND-CPA Scheme 

In this Section we will provide the full construction of an IND-CPA secure en- 
cryption scheme. We will use this scheme in the construction of our CCA1 secure 
scheme. 

Let A; be a security parameter, n G 0(fe 2 /( 1-2e )), h,l2,h G 0 (k 2 t ( - 1 ~ 2e '>) and 
p = 0(A; _ ( 1+2e ^^ 1_2e ^). Let G G F ^ 2 x n be the generator-matrix of a binary linear 
error-correcting code C and Decodec an efficient decoding procedure for C that 
corrects up to al 2 errors (for a constant a). Further let V C F'f be a binary error- 
correcting code with efficient encoding Encode® and error-correction Decode® 
that corrects up to XI 3 errors. 

Construction 1 . The scheme PKEi = (KeyGen, Enc, Dec) is specified by 

— KeyGen(l fc ): Sample matrices A G F^ 1 *" and C G Fj 3 *" uniformly at ran- 
dom, sample the matrix T from x l f xl1 an d the matrix X from x l p Xn ■ & e t 
B = G + T ■ A + X . Set pk = {A, B, G) and sk = T. Output ( pk , sk). 

— Encpfc(m): Takes a public key pk = ( A,B,C ) and a plaintext m G FJ as 
input, samples s from xff, ei from x l f, e 2 from x'f an d e 3 from xl 3 - It sets 
ci = A - s + e 1, C2 = B ■ s + e 2 and C3 = C ■ s + e 3 + Encod e®>(m). Output 
c= (ci,c 2 ,c 3 ). 

— Dec s fc(c): Takes a secret key sk = T and a ciphertext c= (01,02,03) as input. 
Computes y = C2 — T ■ a and s = Decodec ( y ) • Outputs T if decoding fails. 
Otherwise computes m = Decode® (03 — C ■ s) and outputs m. 

We will now show that this scheme is correct, i.e. the probability that a 
decryption-error occurs is negligible in k. 
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Lemma 1. The scheme PKEi is correct. 

Proof. Decryption only fails if one of the two decoding operations fails. We will 
thus bound the probability of failure for both decoding operations. It holds that 

y = C 2 — T ■ ci = B • s T C 2 — T (A *sTei) = dr*sT X • s T e 2 — T ■ ei * 

Thus, it is sufficient to bound the hamming-weight of the error-term v = X ■ 
s + 62 — T ■ e\. Fix constants /3 , 7 > 0 such that 2/3 + 'yp < a and 7 p < A. By a 
Chernoff-bound, it holds that js| < 7 pn, e\ < ''/ph- e 2 < Iph and < ypZa with 
overwhelming probability in k. The decoding procedure Decodec can correct up 
to al 2 errors. With overwhelming probability in k, both matrices X and T are 
(p, 7 p)-good (see Sectional). Thus it holds that |Xs| < pl 2 and |Tei| < ph 
(for sufficiently large k). All together, it holds that 

M < \Xa\ + |e 2 | + |Tei| < 2 pi 2 + Tph < a h- 

Therefore, the decoding-procedure Decodec will successfully recover s. Moreover, 
Decode© will successfully recover rn as | e. 3 1 < jp • 1 3 < A/ 3 . 

We now turn to proof IND-CPA security of the scheme PKEi. 

Theorem 1. Assume that Problem^ is hard. Then the scheme PKEi is IND- 
CPA secure. 

Proof. Let A be PPT-bounded IND-CPA adversary against PKEi. Consider the 
following sequence of games. 

— Game 1: This is the IND-CPA experiment. 

— Game 2: This is the same as game 1, except that during key-generation, the 
matrix B is chosen uniformly at random by the experiment. 

— Game 3: The same as game 2, except that during encryption of the challenge- 
ciphertext, c* = (c* , C 2 , C 3 ) is chosen uniformly at random. 

Clearly, M’s advantage of winning game 3 is zero, as the challenge-ciphertext c* 
is statistically independent of the challenge bit b chosen by the experiment. It re- 
mains to show that the views of A are computationally indistinguishable in game 
1, 2 and 3. For contradiction, assume that A distinguishes game 1 and game 2 
with non- negligible advantage v\ (n) . We will construct a distinguisher B\ that 
distinguishes the distributions ( A , T ■ A + X) and (A, U) with advantage vi(k), 
contradicting the hardness of Problem 0 The input of B\ is an instance ( A ^ , £P ). 
B\ simulates the interaction with A in the same way as game 1 does, except 
for the key generation step. Instead of generating A and B as in game 1, it sets 
A = AP and B = G + BA After the simulation terminates, B\ outputs whatever 
A outputs. Clearly, if £P) is chosen according to (A, T ■ A + X), then M’s 
view in BAs simulation is identically distributed as in game 1. On the other hand, 
if (At , fit) is distributed according to (A, 17), then M’s view in Bps simulation 
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is identical to game 2. Thus it holds that | Pr[£h ( A , TA + X)] — Pr[£h (A, 17)] | = 
| Pr[view^(Gamei)] — Pr[view^(Game2)]| > U\ (k). which contradicts the hard- 
ness of Problem 01 Now assume that A distinguishes between game 2 and game 
3 with non-negligible advantage v 2 {k). We will construct a distinguisher £> 2 that 
distinguishes the distributions ( M,Ms + e) and ( M,u ) with advantage v 2 (k), 
contradicting the hardness of Problem |3 Let the input of B 2 be (M, r), where 
M £ j?(h+Z2+(3)xn anc j r g jpb+i2+Z3_ first partitions M in three matri- 
ces Mi £ ¥ l 2 lXn , M 2 £ ¥ l 2 2Xn and M3 £ F^ 3 *". Likewise, it partitions r into 
ri £ , r 2 £ F 2 and 7-3 £ F 2 3 . B 2 simulates the interaction with A exactly 

like game 2, except for two details. In the key-generation step, it sets A = Mi, 
B = M 2 and C = M3. Moreover, the challenge-ciphertext c* = (c* , c 2 , c?j) by 
c* = ri, c| = r 2 and c| = r3 + Encode® (mb). After the simulation termi- 
nates, Bi outputs whatever A outputs. Clearly, if (M, r) is chosen according to 
(M, Ms + e), then M’s view is identically distributed to game 2. On the other 
hand, if (M, r) is distributed according to (M, u), then M’s view is identically dis- 
tributed to game 3. Therefore, it holds that | P r [£> 2 ( M , M s+e')\ — Pr[B 2 (M , u)]| = 
| Pr[view^(Game2)] — Pr[view^(Game3)]| > v%{k), which contradicts the hardness 
of problem |21 This concludes the proof. 

6 The IND-CCA1 Scheme 

In this Section, we will construct an IND-CCA1 scheme based on the scheme 
PKEi constructed in the last section. We will extend the encryption and de- 
cryption algorithms with an instance-key derivation step, that assigns a tag to 
each ciphertext and derives an instance public or secret key for each tag. These 
instance-keys will be used as keys for PKEi. Moreover, we need to ensure that 
decryption only outputs a plaintext if an incomplete key would have already 
been sufficient to decrypt. Decryption therefore checks if the hamming- weight 
of the randomness used to encrypt is small enough. When the scheme is used 
honestly, this is the case with overwhelming probability. As in the last section, 
let k be a security parameter, n £ 0(fc 2 /( 1-2e )), h,l 2 ,h £ 0(fc 2 ^ 1-2 ^) and 
p = 0(fc -(1+2e )/ (1-2£ )). Let G £ ¥ l2Xn be the generator-matrix of a binary lin- 
ear error-correcting code C and Decodec an efficient decoding procedure that 
corrects up to al 2 errors (for a constant a). Let V C F^ be a binary error- 
correcting code with efficient encoding Encode^ and error-correction Decode® 
as before. Let £ C B 1 ’ 2 be a g-ary code over the alphabet E (with q = | X|) with 
relative minimum-distance <5 and dimension n. Such a code can be generated ran- 
domly (see Section 12.11) . We will now explain how the parameters S and q must 
be chosen. Recall that Decodec corrects up to al 2 errors. As explained earlier, a 
must be big enough to correct the decryption-error, which has hamming- weight 
less than (2/3 + yp)/ 2 (for any constant /3 > 0). As the additional error induced 
by erasures will have hamming weight < (1 — S)l 2 , it is sufficient to choose S 
(which must be smaller than 1 — 1/g) such that 2/3 + yp + 1 — S < a. As we can 
choose /3 and 7 arbitrarily small, we can always find q and 5 such that the above 
is met. Therefore, fix /3, 7, q and 8 such that for sufficiently large n it holds that 
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2/3 + 7p + 1 — S < a. We can choose the constant #3 arbitrarily small and it holds 
that 7 p £ o(l). There exist constructions of efficiently decodable linear codes 
C such that a is slightly larger than 1/400 [ZemOll . Thus we can choose q as 
small as q > l/(a — 2/3 — 7 p) > 400. We remark that this might be drastically 
improved if a more sophisticated joint error-and-erasure correction mechanism 
than ours was used. Our naive mechanism simply treats erasures as errors, but 
there might be much more efficient mechanism, maybe allowing to choose q as 
small as 2. 

Construction 2. The scheme PKE2 = (KeyGen, Enc, Dec) is specified by 

— KeyGen(l fc ): Sample matrices A £ F^ 1 *" and C £ uniformly at ran- 

dom. For every j £ U sample a matrix Tj from xi 2 xil and a matrix Xj from 
X l p Xn - Set Bj = G+Tj-A+Xj . Setpk = {A, (Bj)j e z, C ) and sk = {Tj)j eS . 
Output ( pk , sk). 

— Enc p k(m): Takes a public key pk = {A, (Bj)j e s,C) and a plaintext mgFJ 
as input. Write each Bj as Bj = (bj i 1, . . . ,bj t i 2 ) T (The bj t are the rows of 
Bj). Sample a tag r £ E n uniformly at random and set t = Encode^ (r). It 
then sets Bf = (bf lt 1, . . . , b?, 2 ,i 2 ) T , i.e. the i-th row of Bf is bf u i- Encryption 
now samples s from x", ei from e 2 from x l p an d e3 from x l f ■ dt sets 
c\ = A - s + ei, C2 = Bf ■ s + e2 and C3 = C ■ s + e3 + Encodeu(m). Output 
c= (r, ci, C2, C3). 

— Dec s fc(c): Takes a secret key sk = (Tj)j e s and a ciphertext c = (r,ci, 02,03) 

as input. Write each Tj as Tj = (tj t i, . . . ,tj,i 2 ) T (The tj fi are the rows of 
Tj). Then it computes r = Encodef(r) and Tj = . . . ,tfi 2 ,h) T ■ Next it 

computes y = 02 — If • ci and s = Decodec (y) • Outputs T if decoding fails. 
Otherwise compute m = Decode© (03 — C-s). Now it computes e\ = ci — A-s, 
e-2 = C2 — Bf ■ s, e3 = 03 — C- s— Encode© (m) and checks whether |s| < 7 pn, 
|ei| < 'yph, |e2| < "fph and |es| < 7/9Z3. If yes it outputs m, otherwise _L. 

Correctness of PKE2 follows immediately from the correctness of PKE© The 
only additional step is the check of the hamming weights |s|, |ei|, |e2| and |e3|. 
However, this has been dealt with implicitly in Lemma 0 We will now prove 
1ND-CCA1 security for the scheme PKE2. 

Theorem 2. The scheme PKE2 is IND-CCA1 secure, provided that the scheme 
PKEi is IND-CPA secure and the parameters a, (3, 7, q and 6 suffice 6 < 1 — 1 /q 
and 2/3 + jp + l — 6 < a. 

Proof. Let A be PPT-bounded IND-CPA adversary against PKE 2 . Consider the 
following sequence of games. 

— Game 1: This is the 1ND-CCA1 experiment. 

— Game 2: This is the same as game 1, except that the tag r* of the challenge- 
ciphertext c* = (t*, c*, C2, C3) is chosen before the experiment starts, and 
game 2 aborts if A sends a decryption-query with tag r* . 
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— Game 3 This is the same as game 2, except that the decryption-oracle 
is implemented differently. For a decryption-query c = (t,Ci, 02,03) the 
decryption-oracle proceeds as follows. Let t = Encode^ (r). For all i G 
{1, . . . , Z2} with fj ^ t*, it computes yi = 02,* — tj. ^1. For all remaining 
i it chooses y,; uniformly at random. The decryption-oracle then continues 
like in game 2, computing s = Decodec (y) (and aborts if decoding fails) 
and to = Decodex>(c3 — C ■ s), setting e\ = c\ — A ■ s, e2 = 02 — B ■ s, 
e-i = 03 — C ■ s — Encodex>(m) and checking whether |s| < jpn, |ei| < 7p/i, 
I02I < 7/9/2 and |e3| < 7/9/3. If yes it outputs to, otherwise _L. 

In game 2, the event that A sends a decryption-query with tag r* has proba- 
bility at most f(k)/q n = negl(fc) , where f(k) is a polynomial upper bound for 
the number of decryption-queries A makes. If this event does not occur, game 1 
and game 2 are identically distributed from „4’s view. Thus, from A's view game 

1 and game 2 are statistically indistinguishable. We will now show that game 

2 and game 3 are statistically indistinguishable from A’s view. First, assume 
that for every tag r the matrices Tf and X? are (/?, ypj-good. If this is the case, 
we claim that the decryption oracles of game 2 and game 3 behave identical. 
We split the claim in two cases. The first case is simple: If either |s| > jpn. 

I ei | > 'iph, |e2| > 7/9/2 or |e3| > 7/9/3, then the decryption oracle will return 
T in both games, regardless whether decoding fails or not. In the other case it 
holds that |s| < 7 pn, |ei| < -yph, |e2| < 7/9/2 and |e3| < 7/9/3. Now it holds that 
the hamming- weight of the error-term v = X? ■ s + e-i — Tf ■ e-[ will be bounded 
by 2/312 +7/9/2- Thus, in game 2 the decoding-algorithm Decodec has to correct 
at most (2/8 + jp/h < a/2 and will thus be successful and output the unique 
s. In game 3, there might be up to (1 — S)l2 additional errors Decodec has to 
deal with, as the decryption oracle chooses up to (1 — 5)12 components of the 
codeword y at random. However, since (2/3 + 7/3 + 1 — S)l'2 < a/2 the decoding- 
algorithm Decodec will also succeed in game 3 and output the unique s. This 
concludes the claim. What remains to show for this part of the proof is that, 
with overwhelming probability in k, it holds that for every tag r the matrices Tf 
and Xf are (/ 3 , 7p)-good. We can think of each matrix Tf as a row-sub-matrix of 
a large matrix Tf u a e wf 2Xn that consists of all the rows of all T) for i e X (i.e. 
Tfuii is just the vertical concatenation of all T ) . With overwhelming probability 
in k, Tf u u is (/8/y, 7p)-good (since q is constant). This means that for each e\ 
with |ei| < 7/9/1 it holds that |T/„//ei| < / 3 /q ■ (ql 2) = ph- However, as each 
Tf is a row-sub-matrix of Tf u u, it also holds that |Tfd| < 8/2- Showing that 
|Xfs| < 8/2 works analogously, which concludes this part of the proof. Finally, 
^’s advantage of winning game 3 is negligible in k. given that PKEi is IND- 
CPA secure. Assume for contradiction that A wins game 3 with non-negligible 
advantage u(k). We will construct an IND-CPA adversary B against PKEi that 
wins the IND-CPA experiment with advantage v. £Ts input from the IND-CPA 
experiment is a public key pk' = {A' ,B' ,C') for the scheme PKEi. B now runs 
the key-generation of game 3 with the following modifications. Instead of sam- 
pling the matrices A and C uniformly at random, it sets A = A' and C = C' . 
Now it generates the Bj and 7) exactly like the key-generation in game 3. Then 
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however, it replaces the public-key at the locations that constitute Bf with B' , 
i.e. it sets = fe' T for i = 1, . . . , 1-2- B. Then it simulates the interaction be- 
tween A and game 3, answering decryption-queries like game 3. This is possible, 
as game 3 never uses secret keys tf t) t (that correspond to public keys bf it i) to 
answer decryption queries. Once A sends challenge messages (mo, mi), B for- 
wards (mo, mi) to the IND-CPA experiment and receives a challenge-ciphertext 
ct = (c{,C2,C2). B sends c* = (r*, cj, Cj, c^) to A and continues the simulation. 
Once A terminates, B outputs whatever A outputs. From M’s view, B’s simu- 
lation and game 3 are perfectly indistinguishable, as the distributions of A and 
C are the same, as well as the distribution of the partial public keys 6 T , which 
are independent of one another (only depending on the same A) . Moreover, the 
decryption-oracle behaves identically in both experiments. Therefore, it holds 
that Advi N D-cPA(S) = Advi ND -ccAi (A) = v{k) which contradicts the IND-CPA 
security of scheme PKEi. 

7 The IND-CCA2 Scheme 

We will now provide details how the scheme PKE2 can be transformed into an 
IND-CCA2 secure scheme PKE 3 using additional one-time signatures. We fol- 
low an approach by Dolev, Dwork and Naor jDDNOOj . which has been used in 
several other constructions jPWOXl IPei(M rFTFTOl IUMQN0D1 IMFT2j . especially 
in the world of lattice and coding assumptions, to achieve full CCA2 security. 
First observe that it is not necessary to choose the tag r £ E n uniformly at 
random in the encryption procedure of PKE2. We only need to guarantee that 
a PPT-adversary A will have negligible probability guessing the secret tag r* 
correctly if it is granted a polynomial number of trials (this immediately yields 
the statistical indistinguishability of game 1 and game 2 in Theorem |2J) . Thus 
it is sufficient to sample the tags r from a distribution with high min-entropy. 
Moreover, observe that the proof of Theorem El still holds if we allow A to 
make decryption-queries even after it has received the challenge-ciphertext c*. 
This can be seen by noting that the decryption-oracle in game 3 can answer 
decryption-queries with r ^ t* regardless of whether the challenge-ciphertext 
has been given to A or not (decryption-queries with r = r* are rejected un- 
conditionally). In fact, the decryption-oracle in game 3 is oblivious of whether 
the challenge-ciphertext has been given to A or not. Thus, the scheme PKE 2 
can be recast as a tag-based encryption scheme |Kilf)6| . We will now outline 
PKE 3 . Let SIG = (Gen, Sign, Verify) be an EUF-CMA secure one-time signature 
scheme. For simplicity, assume that the verification- keys vk of SIG are elements 
of E n (this can always be accomplished by encoding vk in the 4- ary alphabet 
E and choosing n large enough). The key-generation of PKE 3 is identical to the 
key-generation of PKE 2 . The encryption procedure PKE 3 .Enc first computes a 
pair of verification and signature-keys (vk, sgk) = SIG.Gen(l fe ). Then it runs the 
encryption procedure PKE2.Enc, with the difference that it sets t = vk instead 
of choosing r uniformly at random. Let d be the output of PKE2-Enc. PKE 3 .Enc 
then computes a = SIG.Sign ssfc (c / ) and outputs the ciphertext c = (c' , a) . The 
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decryption procedure PKE 3 .Dec first checks if cr is a valid signature on d using 
the verification-key vk = r (where r is the tag given in d). If the check succeeds, 
it runs the decryption procedure PKE 2 .Dec on the ciphertext d and outputs 
whatever PKE 2 .Dec outputs. We summarize this in the following construction. 
Let Enc ' pk (m,vk) be a procedure that does exactly the same as PKE 2 .Enc p fc(m), 
but sets t = vk instead of choosing r uniformly at random. 

Construction 3. The scheme PKE 3 = (KeyGen, Enc, Dec) is specified by 

— KeyGen(l fe ): Compute ( pk,sk ) = PKE 2 .KeyGen(l fc ) and output ( pk,sk ). 

— Encpfc(m): Generate ( vk,sgk ) = SIG.Gen(l fc ), encrypt d = Enc ' pk (m,vk), 
sign a = SIG.Sign sgfc (c / ) and output c = (d,a). 

— Dec s fc(c): Let c = (d , a) and d = (r, ci, c 2 , c 3 ). Set vk = r. Check if 
SIG.Verify^ fc (c', a) = 1, if not abort. Otherwise compute m = PKE 2 .Dec s fc(c') 
and output m. 

Theorem 3. The scheme PKE 3 is IND-CCA2 secure, provided that SIG is an 
EUF-CMA secure one-time signature scheme and the same requirements as in 
Theorem 03 are given. 

Proof. (Sketch) Let A be PPT-bounded IND-CCA2 adversary against PKE 3 . 
It suffices to show that with overwhelming probability, every decryption-query 
by A tagged with r* (the tag of the challenge-ciphertext) is rejected. Thus, we 
can recycle the proof of Theorem 5 almost entirely, we only need to replace the 
indistinguishability of game 1 and game 2 in the proof of Theorem |2 The rest 
of the proof is identical. Consider the following two games. 

— Game 1: This is the IND-CCA2 experiment. 

— Game 2: This is the same as game 1, except that the tag r* of the challenge- 
ciphertext c* = (t*, c*, C 2 , c 3 , a*) is generated before the experiment starts, 
and game 2 aborts if A sends a decryption-query with tag r* . 

Assume that A distinguishes between game 1 and game 2 with non-negligible ad- 
vantage v{k). Clearly, given that the decryption-oracle rejects every decryption- 
query tagged with r*, both games are identically distributed from M’s view. 
Thus, to distinguish game 1 and game 2 A must generate a decryption-query 
tagged with r* that is accepted by the decryption-oracle. This implies that such 
a decryption-query c = (d,a) with d = (t*, ci, c 2 , c 3 ) suffices the condition 
SIG. Verify,, fc (c', a) = 1, where vk = t*. Thus we can assume that A gener- 
ates such a decryption-query with probability v(k). We construct an EUF-CMA 
adversary B that breaks the EUF-CMA security of SIG with probability v(k). 
Let vk be the verification key provided to B by the EUF-CMA experiment. B 
simulates game 2 with A , but makes the following changes. Instead of gener- 
ating the tag t* itself, it sets r* = vk. Moreover, B obtains the signature a* 
of the challenge-ciphertext c* by querying its signature-oracle with d* , where 
d* = Er\c pk (mi,, vk). Finally, once A sends a decryption-query c = (c',cr) with 
d = (t*, ci, c 2 , c 3 ) and SIG.Verify wjfc (c', a) = 1, B outputs ( d,a ) an terminates. 
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Clearly, game 2 and the simulation of B are identically distributed from the 
view of A. Thus, the event that A sends a decryption-query c = (e? , a) with 
d = (r*, Ci, C2, C3) and SIG. Verify vk (d,a) = 1 happens with probability v(k) in 
B ' s simulation. This means that B outputs a valid forged signature with proba- 
bility v{k). contradicting the EUF-CMA security of SIG. 

8 Conclusion 

In this work we constructed the first IND-CCA2 secure public key encryption 
scheme based solely on the hardness of a low-noise variant of the learning parity 
with noise problem. To achieve this, we introduced a novel all-but-one simula- 
tion technique. This new technique enabled the construction of a CCA1 secure 
scheme, which is more efficient than any previous such construction based on 
the correlated-products approach. The scheme enjoys a constant-factor cipher- 
text expansion as well as asymptotically efficient key-generation, encryption and 
decryption. 
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Abstract. This paper discusses the provable security of the compres- 
sion functions introduced by Knudsen and Preneel 1 1 I II 21 1 :i| that use lin- 
ear error-correcting codes to build wide-pipe compression functions from 
underlying blockciphers operating in Davies-Meyer mode. In the infor- 
mation theoretic model, we prove that the Knudsen-Preneel compression 
function based on an [r, k,d\ w code is collision resistant up to 2 2^-3,/ +3 
query complexity if 2d < r 4- 1 and collision resistant up to 2 2r ~ 2d + 2 
query complexity if 2d > r+ 1. For MDS code based Knudsen-Preneel 
compression functions, this lower bound matches the upper bound re- 
cently given by Ozen and Stam m 

A preimage security proof of the Knudsen-Preneel compression func- 
tions has been first presented by Ozen et al. (FSE TO). In this paper, 
we present two alternative proofs that the Knudsen-Preneel compression 
functions are preimage resistant up to 2 T? query complexity. While the 
first proof, using a wish list argument, is presented primarily to illustrate 
an idea behind our collision security proof, the second proof provides a 
tighter security bound compared to the original one. 


1 Introduction 

A cryptographic hash function takes a message of arbitrary length, and returns a 
bit string of fixed length. The most common way of hashing variable length mes- 
sages is to iterate a fixed-size compression function (e.g. according to the Merkle- 
Damgard paradigm j7!2(")I h The underlying compression function can either be 
constructed from scratch, or be built upon off-the-shelf cryptographic primitives 
such as blockciphers. Recently, blockcipher-based constructions have attracted 
renewed interest as many dedicated hash functions, including those most com- 
mon in practical applications, have started to exhibit serious security weak- 
nesses |2lfill 811 !)l‘29l34l3r)fT7i] . By instantiating a blockcipher-based construction 
with an extensively studied (and fully trusted) blockcipher, one can conveniently 
transfer the trust in the existing blockcipher to the hash function. 
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National Research Foundation of Korea(NRF) funded by the Ministry of Education, 
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Compression functions based on blockciphers have been widely studied 
f5l4l9ll(114l22l25l2fil27l28l5UI51l52l3H| . The most common approach is to con- 
struct a 2«-to-n bit compression function using a single call to an n-bit blockci- 
pher. However, such a function, called a single-block-length (SBL) compression 
function, might be vulnerable to collision attacks due to its short output length. 
For example, one could successfully mount a birthday attack on a compression 
function based on AES-128 using approximately 2 64 queries. This observation 
motivated substantial research on constructions whose output size is larger than 
the block length of the underlying blockcipher(s). A typical approach has been to 
construct double-block-length (DBL) hash functions, where the output length is 
twice the block length of the underlying blockcipher(s). Since the 1990s various 
double-block-length constructions have been proposed mostly without formal 
security proofs. Those constructions were mainly focused on optimizing their ef- 
ficiency in terms of the rate, while only recently have a few double-block-length 
constructions been supported by rigorous security proofs |8ll5ll7l2%j . 

The Knudsen-Preneel compression functions. On the other hand, Knud- 
sen and Preneel |1 1 1 1 21 1 -~>j adopted a different approach, aiming at achieving a 
particular level of security using a given number of ideal compression functions 
as building blocks. Specifically, they used r independent en-to-n bit random 
functions to build the entire compression function producing rn-bit outputs. 
The parameter c is typically two or three so that the inner primitives can be 
constructed from n-bit key or 2n-bit key blockciphers operating in Davies-Meyer 
mode. The main idea of Knudsen and Preneel’s approach lies in the method of 
deriving the inputs to the inner primitives from the input to the entire compres- 
sion function. They used an [r, k, d\ linear error-correcting code over a finite field 
in a way that its generator matrix extends a ken - bit input to the entire compres- 
sion function to an ren-bit string. This string is parsed into r blocks of the same 
size, and the blocks go into the inner primitives in parallel. The output of the 
entire compression function is the concatenation of the n-bit outputs obtained 
from the r inner primitives. This Knudsen-Preneel (KP) compression function is 
fed to the Merkle-Damgard transform, producing the final output via a random 
finalization function whose output size might depend on the security target. 

Due to the property of linear codes of minimum distance d, two different 
inputs to the KP compression function determine two sets of inputs to the inner 
primitives that are different at least at d positions. Based on this observation, 
Knudsen and Preneel made a certain plausible security assumption (see m 
Section 5]) which was used for their security proof that the KP compression 
function is collision resistant up to 2 a query complexity. They also expected 
that the KP compression function would be preimage resistant up to 2^ _1 ^ n 
query complexity. In order to maximize the query complexity, Knudsen and 
Preneel suggested the use of MDS codes satisfying d = r — k + 1. 

Attack history. For KP compression functions based on an MDS code, the 
designers described preimage attacks matching their security conjecture, while 
their collision attacks were far from tight for many of the parameter sets. 
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Afterwards Watanabe cm proposed a collision attack beating the original con- 
jecture for many cases. In particular, for 2 k > r and d < k, one could find a 
collision with k2 n query complexity. 

Ozen, Shrimpton and Stam m presented a preimage attack of 2 t query 


complexity, far less than the bound of 2^ d b" that was originally conjectured 


by the designers. By giving a preimage security proof, they proved that their 
attack is tight. Their result also implies that one could expect a collision with 
about 2^v queries. 

Subsequently, Ozen and Stam m presented new collision attacks using the 
ideas of Watanabe and the preimage attack of Ozen, Shrimpton and Stam. For 
2k > r and d < k, their attacks require 2 3k ~ r query complexity. This implies 
that the KP compression functions do not achieve the security level they were 
originally designed for. On the other hand, tightness of their attack remained a 
open question. 

1.1 Our Contribution 

In this paper, we prove that the KP compression function based on an [r, k, d\^ 
code is collision resistant up to 2 ar-3d+3 query complexity if 2 d < r + 1 and 
collision resistant up to 2 2r ~ 2d + 2 query complexity if 2d > r + 1. For KP com- 
pression functions based on an MDS code, this lower bound, simplified to 2 3k ~ r 
for 2d < r+1 and 2^ for 2d > r + l respectively, matches the upper bound given 
by |21l23j . For two parameter sets [4, 2, 3]g and [5, 2, 4] 8 such that 2d > r + 1, 
the collision security is proved up to the query complexity equal to or beyond 
the block-size of the underlying blockciphers. 

Ozen, Shrimpton and Stam m proved that the preimage finding advantage 
of a q-query adversary is not greater than 



where we set 5 = r< ' k ^ k in Theorem 10 of |2U • The upper bound ex (r, k) 


becomes negligible as q gets much smaller than 2 tt. In this paper, we present 
two alternative preimage security proofs, where the second proof provides a 
tighter security bound compared to the original one. Specifically, the preimage 
finding advantage of a Q-query adversary is upper bounded by 



Our upper bound e 2 (r,k) is significantly smaller than (r. k) since (r,k) < 
(l)ei(r,k) 1+ ^. For example, for a [5, 3, 3]4 code based KP compression func- 
tion, we have ex(r, k) > while (r,k) = 4^r- 

Our first preimage security proof, using a wish-list argument, is presented 
primarily to illustrate an idea behind our collision security proof. This proof is 
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Table 1 . Provable security of Knudsen-Preneel constructions. Non-MDS parameters 
in italic. The parameter sets satisfying r + 1 > 2k are [4, 2, 3] g and [5, 2, 4] g . 
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2 3n/3 

2 2n/3 

V 

2 3n/2 

2 3n/2 

[16, 13, 4] 16 


(16+10)" -4 16" 

2 13n/23 

2 13„/23 


2 16n/13 

2 16„/13 

[4, 2, 3] g 


(4 + 2)n -4 4n 

2 71 Ed 

2" 

•J 

2 2n 

2 2„ 

[M,3] g 


( 6 + 6 )„ 6n 

2 2„/3 

2 2n/3 

•J 

2 3n/2 

2 3n/2 

[9.7,3]* 


(9 + 12)n -4 9n 

2 7n/!2 

2 7a/12 

•J 

2 9„/7 

2 9n/7 

[5,2,4]* 

3n —V n 

(5 + l)n^5n 

2 W 4 |2j| 

2 5„/4 

yf 

2 5n/2 

2 5n/2 

[7,4,4]* 


(7 + 5)n->7n 

2 4„/5 

2 4 "/5 

A / 

2 7n/4 

2 7 n /4 

[10,7,4]* 


(10+ll)n -4 10" 

2 7n/ll 

2 m/ii 


2 10n/7 

2 10n/7 


tight only for the parameter sets of MDS codes. Table [I] summarizes these results 
for 16 parameter sets proposed by the original designers. 

Wish list argument. In the information-theoretic model, the most typical 
approach for a security proof has been upper bounding the probability that a 
single query of an adversary achieves a certain security goal (such as finding a 
collision or finding a preimage of a target image). The upper bound of the total 
adversarial advantage is obtained by multiplying this upper bound by the num- 
ber of queries allowed to the adversary. Most single-block-length constructions 
can be analyzed in this way iza- 

However, certain constructions might not allow an upper bound small enough 
to uniformly apply to all the queries. One of the techniques to address this diffi- 
culty is to define a certain bad event that happens with only small probability, 
and prove that it is hard for a single query to achieve an adversarial goal with- 
out the occurrence of the bad event. This approach was adopted in the collision 
security proof of MDC-2 and M JH hash functions jl6!24j as well as the preimage 
security proof of the KP compression functions m 

Another technique is to cleverly modify the adversary: the modified adver- 
sary, typically using the original adversary as a subroutine, is given slightly 
more power than the original one. So the success probability of the modified 
adversary is not reduced, while it becomes much easier to upper bound. With 
this approach, one can prove the security of Abreast-DM and Tandem-DM hash 
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functions [Mil 511 7| . Our second alternative preimage security proof of the KP 
compression functions also follows this approach. 

As yet another technique, one might use an observation that a security goal is 
usually achieved by a group of queries and the last query that achieves the goal 
is uniquely determined by the previous queries in the group. We assume, once 
a new query is obtained, the adversary computes a query that might become 
the last winning query along with a certain group of existing queries (including 
the new query) . If this query has not been asked, the adversary includes it in a 
wish list expecting this wish is accomplished sometime later. If we have upper 
bounds on the size of the wish list (hopefully smaller than the total number of 
queries) and the probability that each wish in the list is accomplished, the total 
adversarial advantage can be obtained by a union bound. This technique, called 
a wish list argument, was first used in the preimage security proof of certain 
double-length blockcipher-based compression functions Q . This work is the first 
application of a wish list argument to a collision security proof (combined with 
a bad event argument). In our extension, each wish is typically given as a set of 
unasked queries, rather than a single query. 

Efficiency. Unfortunately, for most of the parameter sets, the KP compression 
functions do not provide collision security beyond the block-size of the underly- 
ing blockcipher. However, from a practical point of view, some of the KP com- 
pression functions are still comparable to the existing blockcipher-based hash 
functions such as MDC-2, Abreast-DM and Tandem-DM in terms of efficiency 
and probable security. 

In MDC-2, compression of a single n-bit message block requires two calls to 
the underlying n-bit key blockcipher, and it enjoys a ^-bit collision security 
proof. This construction is comparable to the KP compression functions using 
[12, 9, 3] 4 , [16, 12, 4] 4 or [8, 6, 3] is codes: they are all of rate \ using 2n-to-n bit 
primitives (or equivalently n-bit key blockciphers), and supported by a qf-bit 
security proof. 

The compression function H = KP 1 ([6, 4, 3]s) using 3n-to-n bit primitives (or 
equivalently 2n-bit key blockciphers) is supported by a pp-bit security proof. 
This construction has the same rate and the same provable security as MJH US! 
using a 2n-bit key blockcipher. 

The compression function H = KP 1 ([4, 2, 3]g) using 3n-to-n bit primitives (or 
equivalently 2n-bit key blockciphers) is supported by an n-bit security proof. 
This construction is comparable to Abreast-DM and Tandem-DM, both of which 
are of rate \ using a 2n-bit key blockcipher. We also refer to jSj for comparison 
of this compression function with the other existing schemes in terms of AES 
driven implementations. 

The compression function H = KP 1 ([5, 2, 4]g) is relatively slow with rate Jr, 
while this is the first construction that enjoys the provable collision security 
beyond the block-size of the underlying blockciphers. However it remains open 
whether this KP compression function is still secure when the inner primitives 
are instantiated with 2n-bit key n-bit blockciphers, since in general an n-bit 
blockcipher loses its randomness beyond 2" queries (for a fixed key). The other 
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open question raised here is the provable security of KP constructions where all 
the inner primitives are instantiated the same. 

2 Preliminaries 

2.1 The Knudsen-Preneel Compression Functions 

An [r, fc, d] 2 e linear error-correcting code C is a /c-dimensional subspace of V^e, 
where F 2 e denotes a finite field of order 2 e . An [r, k, d] 2 e code C can be represented 
by a k x r generator matrix G over F 2 e where every codeword of C is expressed 
as a linear combination of the row vectors of G, namely w ■ G for some w £ F^e. 
Obviously, k < r, and the Singleton bound states that 


When a code meets the equality of the Singleton bound, it is called maximum dis- 
tance separable (MDS). As an important property of MDS codes, any k columns 
of a generator matrix of an MDS code are linearly independent. 

Let F 2 e = F(w) be an extension of F 2 generated by the root w of a primitive 
polynomial p( x) of degree e, and let F| be an e-dimensional vector space over F 2 . 
In order to clearly define the Knudsen-Preneel compression functions, we need 
to identify F 2 e and F| by a group isomorphism ip : F 2 e — >• F| such that 

ip(a e - iw e_1 H b a\bj + oo) = (a e -i, • • • , ai,ao) T - 

For each g € F 2 e, consider a map 

<%) : F| > F| 
u ' — > ip{g ■ 

where denotes the field multiplication of F 2 e. This is a linear map, so it is 
associated with an e x e matrix over F 2 with respect to the standard basis. We 
will denote this matrix as <p(g)- Since for every g,h £ F 2 e, 

1. 0(g + h) = $(g) + 0(h), 

2. $(gh) = $(g)o&(h), 

we also have cj>(g + h) = <j>(g) + <f>{h) and <p(gh) = <j>{g)(f>{h) for all g,h £ F 2 e. 
This implies the map <f> : F 2 e — >• F^ 6 is a ring homomorphism. 

Suppose that <j>(g) is the identity matrix, or equivalently ( I>(g) is the identity 
map. Since this implies g ■ '0 _1 (w) = V ,_1 ( u ) f° r every u £ F|, g should be the 
multiplicative identity of F 2 e. This implies again that <f> is injective. 

This injective ring homomorphism naturally extends to </> : FJJ** 5 — > F 2 6Xfce 
where <j> is applied to each component and then (F 2 Xe ) rxfe is identified with 
F^exfee ]sq ow we are ready to define the Knudsen-Preneel compression functions. 
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Definition 1. Let C be an [r,k,d \ 2 e linear code with a generator matrix G £ 
W k * r and let <f> : F 2 e — > F^ 6 be the injective ring homomorphism defined above. 
Let e = be and n = bn' for some positive integers b, c, n, n! , and let ek > rb. 
Then the Knudsen-Preneel compression function 

H = KP 6 ([r, k, d] 2 e) : {0, l} kcn -l {0, l} r " 


making oracle queries to public random functions fi : {0, l} cn — > {0, 1}", l = 
l,...,r, computes H(W) for W £ {0, l} fec " as follows. 

1. Compute X <— ( cj)(G T ) <S> I n ') ■ W. 

2. Parse X = (xi, . . . ,x r ), where x \, . . . , x r £ {0, l} cn . 

3. Make oracle queries yi = fi(xi) for l = l,...,r, and output the digest Z = 

Vl\\---\\Vr. 

Here ® denotes the Kronecher product and I n > the identity matrix in F£ xn . 


Example 1. The above mathematical description of Knudsen-Preneel construc- 
tions looks complicated, while the constructions themselves are very simple. For 
example, let e = 2 and let F 2 2 = F(w) for a root u satisfying cu 2 + w + 1 = 0. 
For aitu + ao € ^2 2 ! 


£u(aicu + ao) = (ao + ai)w + a\. 


This implies (j>(u) = . Since <j> is an injective ring homomorphism 


m = 


’ ^1*0 = L, i ## = 


0((U+1) = = 


Let C be a [5, 3, 3] 4 linear code with a generator matrix G = 
c = 2, then b = 1, n = n! and 


10 0 1 1 

0 10 1 u) 


’l0|00|00|10| 

1 o' 

01|00|00|01| 

0 1 

00|10|00|10| 

1 1 

00 I 0 1 I 00 I 0 1 I 

1 0 

00|00|10|10| 

0 1 

0 0|00|01|01| 

1 1 


Let H : {0, l} 6n — > {0, l} 5n be the resulting KP compression function using 
five public random functions fi : {0, l} 2n —1 {0,1}", l = 1, . . . , 5. Then for 
W = wi|| • • • ||<u 6 , 
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where x\ = (wi||w 2 ), x 2 = (w 3 ||u;4), x 3 = (cJ 5 ||cj 6 ), x a = (wi ®u 3 \\w 2 ®w 4 ® 
Wq), x 5 = (Wl ® 0» 3 ® 6J4 ® W6||W2 ® IX3 ® W5 ® CJe). 

Throughout this work, we will simply write C PRE (W ) = ( (j){G T ) ® /„/) ■ W. For 
the security analysis of H, we need to state some properties of C PRE . 

Definition 2. Let I C [l,r] and let ( x*)i e x G riieiiOil} 0 "- ( x i >---> x r) 6 
({0, 1 } c ”) r is called an extension of(x*)i e x if there exists an input W G {0, l} fccn 
such that C FRE (W) = {x %,.. ., x r ) and xi = x* for l G I. We will say ( x*)i e x 
is valid if it has an extension Q 

For I = [1, r], valid tuples are exactly the images of C PRE . Due to the linearity 
of C PRE (with respect to bitwise xor “®”), we have the following property. 

Property 1. If (x;); e x and {x\)i e x are valid, then ( xi ® x[)i e x is also valid. 

Property 2. Let I be a subset of [1, r] such that \I\ = r — d + 1. If (x*)i e x € 
n i6l {0, 1} C " is valid, then it has a unique extension. 

Proof. Suppose that (xi, . . . ,x r ), and {x ' x , . . . , x' r ) are extensions of (x*)i e x- 
Then (x\ ® x \, . . . , x r ® x' r ) is also an extension of (0); e x- Since any nonzero 
codeword in C has at least d nonzero coordinates, we have (xqSx'x, . . . , x r ®x' r ) = 
(0, . . . , 0), and hence (xi, . . . ,x r ) = (x ' x , . . . , x' r ). □ 


2.2 Collision Resistance and Preimage Resistance 

In this section, we review security notions of collision resistance and preimage 
resistance in an information theoretic sense. In the collision resistance experi- 
ment, a computationally unbounded adversary A makes oracle queries to public 
random functions fi, l = 1, . . . , r, and records a query history Q, which is initial- 
ized as an empty set. When A makes a new query fi(x), a query-response pair 
(l,x,ft(x)) is added to Q0 We will loosely write (/, x) G Q indicating that the 
value of fi(x) has been determined by A's query. Furthermore, we will denote 
A’s i-th query as (l t ,x‘ l 2 ), i = 1, . . . ,q, indicating the i-th query is /;»(x l ). 

At the end of the collision-finding attack, A would like to find queries 

(ljX n ), . . . , (r, x v ), (l,^- 51 ), . . . , (r, x jr ) G Q 

satisfying the following two conditions. 

1. (x Zl ,... , x %T ) and (x J1 , . . . , x Jr ) are distinct valid tuples. 

2- h ( xh ) H *• • ll/r (^) = fi ( xh ) • II fr (^). 

1 We regard Ilieii 0 ’ 1}°” as the set of all functions from 1 to {0, 1} C ". Thus, even in 
case \T\ = |X'|, rL 6I {0, 1} C " + ILex'lO, l} cn as long as X ^ X'. We also naturally 
identify ({0, 1 } c ") r with {0, l} cr ". 

2 Unless stated otherwise, we will not allow any redundant query. 
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In this case, (*;,i;)ie[i,r] called an index sequence of a collision. The suc- 
cess probability of A’s finding a collision is denoted by Adv“'(A). The maxi- 
mum of Adv^ 1 (A) over the adversaries making at most q queries is denoted by 
Advg'(g). 

In the preimage resistance experiment, A chooses a target image Z = 
zi\\ ■ ■ ■ || z r at the beginning of the attack, where z\ z r £ {0, 1}”. After mak- 

ing a certain number of oracle queries to fi, l = 1 , . . . , r, A would like to find 
queries 

(l,x il ),...,(r,£ ir ) e Q 

such that /i (a;* 1 ) || • • • ||/ r (x* r ) = zi|| • • • \ \z r . The success probability of A’s 
finding a preimage is denoted by Adv^ e (A), and Adv^ e (g') is the maximum of 
Adv^ e (A) over the adversaries making at most q queries. There might be several 
definitions of preimage resistance according to the distribution of a target image. 
The definition described here, called everywhere preimage resistance, is known 
as the strongest version in the sense that an adversary chooses its target image 
on its own. 


3 Preimage Resistance Proofs 

In this section, we will give two preimage resistance proofs of the KP compression 
functions. In both security proofs, we let Z = z\\\- ■ - \\z r be the range point to 
be inverted where zi , . . . , z r £ {0, 1}". When an adversary A succeeds in finding 
a preimage of Z, predicate Pre is set to true by definition. So we need to upper 
bound the probability Pr[Pre]. Throughout this work, we will write N = 2”. 


3.1 The First Alternative Proof 

Consider a subset T C [l,r] such that |T| = r — d + 1. With respective to 
this subset, we define predicate P rer, where Prer is true if A obtains an index 
sequence of a preimage D = 1 r j such that 

1. ( l,x M ) e Q and fi{x M ) m Zi for l = 1, . . . 

2. max ;er {*;} < min ;e[lir]vr {ii}. 

By the second condition, T specifies the function indices where the first r — d+1 
partial preimages are determined. More precisely, a partial preimage can be 
defined as follows. 

Definition 3. Let T be a subset of [1, r] such that |T| = r — d + 1. A sequence 
of indices 

P = (*0;eT 

is called a partial preimage at T if (l, x tl ) £ Q and fi(x tl ) = zi for l £ T ■ 
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We will upper bound Pr [Pre] by using the following implication. 
Pre =>- \J Prer 


► Bad(M) V \J (-iBad(M) A Pre T ) , 


( 1 ) 


where the parameterized predicate Bad(M), M > 0, is true if there exists a 
subset T C [1, r] of size r — d+1 such that the number of partial preimages at 
T is greater than M. 

In order for a preimage finding adversary A to set Prer to true, A has to first 
complete a partial preimage at T- If {x H )i ^ t is valid at the point when a partial 
preimage P = ( ji)ieT completed, then the remaining queries (^i)ie[i,r]\r that 
might complete a preimage of Z along with {x H )i^t are uniquely determined 
by Property El Specifically, it is required that fi{x{) = zi for l £ [l,r]\T. If any 
of these evaluations has not been determined, we include hito a 

wish list C , expecting all of these evaluations to happen sometime later. A single 
query might include a multiple number of wishes into C by completing a multiple 
number of partial preimages at T. However a single partial preimage at T is 
associated with a unique element in C. Therefore the size of C would be at most 
M without the occurrence of Bad (M). Since each wish would be accomplished 
with probability 1 /TV”! l 1 >' r 'l\" 7- l = 1 /TV** -1 , we have the following upper bound. 


Pr [-iBad(M) A Prer] < E Pr [the i-th wish is granted] < — (2) 


In order to address the remaining problem of upper bounding the probability of 
Bad (M), we will define a random variable X that counts the number of partial 
preimages at T, and probabilistically upper bound the value of X using Markov’s 
inequality. 

Fix a subset T C [1 ,r] of size r — d+1, and define a random variable X P 
for each sequence P = ( ii) leT 6 ELer [1- ( l]> where X P = 1 if (l,x H ) £ Q and 
fi(x tl ) = zi for every l £ T, and X P = 0 otherwise. If we define 

*= E 

then X counts the number of partial preimages at T. Since fEeT [E d] \ = q r ~ d+1 
and 

Pr[^ = l (VEx(X P )< 77 W_, 

we have Ex(X) < . Using Markov’s inequality, for M > 0 we have 

n r-d + 1 

Pr [X > M]< 


MN r ~P 
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Applying a union bound over subsets T C [1, r] of size r — d + 1, we have 

P r| Bad(M)]<( r _^ 1 ) ]i ^=(/ 1 )-XAL. (3) 

By ©, 0 and 0. we have 

Let 


by setting q r d+1 /(MN T d+1 ) = M/N d 1 . Then we have 


pr[preis2 C-i) 


q r 2+1 
iVi 


The following theorem summarizes this result. 

Theorem 1. Let H be the Knudsen-Preneel compression function based on an 
[■ r , k, d\ 2e code. TTien we /lawe 

Ad vg e (g) < 2^ 

Aor MDS codes, we have 

Example 2. Let H be based on a [5, 3, 3 ] 4 MDS code. Then Theorem [T] implies 



Adv£ e (g) < 


Therefore H is preimage resistant up to TV 5 / 3 query complexity. 


3.2 The Second Alternative Proof 

The main idea of this proof is based on the observation that for any set of r 
queries to ft, ■■ -,f r that are in the range of C PRE , one can appoint k queries 
that expand the span. Whenever any of such queries is made by an adversary 
A , we let the corresponding modified adversary A! immediately make any other 
queries that are added to the span. In this way, we can fix all the indices of 
queries at which A ' obtains a full preimage of Z . This modification makes upper 
bounding the preimage finding advantage of A! much easier than A. 
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To be precise, let H = KP fc ([r, k. d\ 2 e) be given with a generator matrix 
G = [Gi, G 2 , • ■ ■ ,G|| 

where Gj is a k X 1 column matrix for i = 1, .... r. (G is not necessarily in 
standard form.) Fix a sequence 

T = (h,h.,.....lk)e[l,r} k 

such that column matrices Gq , . . . , Gq are linearly independent (which implies 
h,h, ■ ■ ■ ,h are all different), and a sequence 

P = •••}**) 6 [1, 9] fe 

such that h < *2 < • • • < ffc. If partial preimages /q(a; 11 ) = 2 q, ■ ■ ■ , fi k (x ik ) = 
gq are found H then these queries uniquely determine the remaining r — k queries 
xi, l £ [l,r]\T, such that, setting zq = for lj £ T, (%i)ie[i,r] is an image 
of C FRE . Specifically, each of the remaining queries is represented as a linear 
combination of x 11 , . . . , x lk . We define predicate Pre-r.p where Prep.p is true if 
the following two conditions are satisfied. 

1. (l at x‘ la ) 6 Q and fi a (x' la ) = zi a for a = 1, . . . , k. 

2. For all l £ [1, r]\T , let a be the first index such that G; is represented as a 
linear combination of Gq, . . . , Gi a . A obtains fi(xi) = zi after A makes 
the i a -th query. (Note that xi is determined as a linear combination of 
a;* 1 , . . . , x la .) 

Then we have the following implication. 

Pre=> \/ Pre T .p- (4) 

(: T,P ) 

In order to prove the above implication, suppose that A sets Pre to true by 
obtaining /q (a; 11 ) = z\, ■ ■ ■ , fi r (x lr ) = z r in an order of ii < ii < . . . < i r . From 
the sequence (Zi , . . . , l r ) £ [1, r] r , we can extract a subsequence T £ [1, r) k using 
the following algorithm. 

T^0 

For a = 1, . . . , r, 

if Gi a is not represented by a linear combination of Gq l £ T then 

T la 

Since G is of rank k, we have \T\ = k. We can also check that Prep.p is true 
with P = (i a ) where a satisfies l a £ T- 

Sequence P fixes the indices of queries when we need to obtain the partial 
preimages of zi for l £ 7”. In order to fix the indices of queries from which we 
obtain the remaining partial preimages, we construct a modified adversary A ! 
that uses A as a subroutine. The behavior of A! can be illustrated as follows. 

3 Here we are using slightly different notations from Section 12.21 by assuming x 1 " is 
queried to fi a not / a . This implies l a = l l ° for a = 1, ... , k. 
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1. Between A and the random function oracles, A! faithfully relays all the Ws 
queries and the oracles’ responses. 

2. Once queries fi x ( x n ), ■ ■ ■ ,fi a (x la ) are made for a = 1, . . . , r, A! searches for 
Gi that is represented as a linear combination of Gi x , . . . , Gi a with a nonzero 
coefficient of Gi a . 

3. For such an index l, query xi that is consistent with x 11 , . . . , x la is determined 
as a linear combination of x n , . . . , x la . A! makes an additional query fi{x{) 
without relaying the response to A. When A makes a certain query, A' might 
need to make a multiple number of additional queries, while we fix an order 
between those queries. 

In case A requests any of the additional queries later, A ' would have to make a 
redundant query. Including the redundant queries, the number of queries made 
by A! is at most q + r — k. In this way, (T, P) induces new sequences 

T = ( Z ' 1( ^,...,0 e [ l , rf , 

p' = 0'i,4,---,v) £ jjjMr 

such that l' a are all distinct, i\ < i ' 2 < ■ ■ ■ < i ' r , and A setting Pr e-r,p to true 
implies that A' obtains fi> a {x l '» ) = zi> a as fresh queries for a = 1, . . . , rQ 

Example 3. Let H be based on a [5, 3, 3]4 MDS code with a generator matrix 

G=[G u G 2 ,G 3 ,G 4 ,G 5 ]. 

Let T = (1,5,3) and P = (ii, * 2 , * 3 ), and let G 2 = AGi and G 4 = P 1 G 1 +P 3 G 3 + 
/X 5 G 5 for some constants A, , ps, ps where A and p 3 are nonzero. Then (T, P) 
induces T' = (1,2, 5, 3, 4) and P' = + 1,*2 + 1 , *3 + 1,^3 + 2). Note that 

12 and '<3 have been replaced by *2 + 1 and *3 + 1 respectively in P', since one 
additional query has been inserted right after the L-th query. 

Since (T ’ , P') fixes all query indices i' a that determine a preimage of Z . we have 

Pr [A sets Pre-r.p to true] < Pr [A! sets Pr er',P' to true] < (5) 

Since the number of possible choices for (T, P) is at most 

CMJMO* 

and by 0) , © we conclude 

pr[preis G)^ 

To summarize this result, we have the following theorem. 

4 Without allowing a redundant query, P' is not uniquely defined from (T, P). P' 
would be different according to the point of time when a redundant query is made. 
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Theorem 2. Let H be the Knudsen-Preneel compression function based on an 
[r, k , d\ 2 * code. Then we have 



Example 4 • Let H be based on a [5,3, 3]4 MDS code. Then Theorem |21 implies 



4 Collision Resistance Proof 

Consider two sets of evaluations {fi{x ll )) le ^ lr j and (/z(zE Ji ))jg[ lr ^ of the inner 
primitives for H = KP 6 ([r, k, d]2<0- Let S C [l,r] and suppose that it = ji (and 
hence x H = x jl ) for l G S. As long as (£*‘)ze[i,r] and [i,r] are valid, partial 

inner collisions fi{x H ) = fi{x^ 1 ) for l G [l,r]\5 suffice to guarantee an actual 
collision of H regardless of the evaluations of fi(x n )(= fi{x il )) for l G S. For this 
reason, we will call the indices in S inactive and the other indices active. The 
probability of finding a collision turns out to be closely related to the number of 
inactive indices that contribute a collision. 

When a collision happens, let predicate Col be set to true by definition. Our 
security proof begins with decomposing this predicate into subcases according 
to the number of inactive indices. For 0 < s < r — d, consider a subset S C [1, r] 
such that |«S| = s. With respective to this subset, we define predicate Cols, where 
Cols is true if A obtains an index sequence of a collision C = such 

that 


il = ji if and only if l G S. 


Note that more than r — d inactive inner collisions enforce (x n , . . . , x lr ) = 
(x^ 1 ,... ,xJ T ) since H is based on a code of minimum distance d. Therefore 
we have 



4.1 Inner Collisions Compatible with Inactive Indices 

For s < d— 1, we will upper bound Pr [Cols] by a wish list argument. In order to 
upper bound the size of a certain wish list, we need a notion of partial collisions. 
Similar to partial preimages, each partial collision will uniquely determine a wish 
in the list, so the size of the wish list is upper bounded by the number of partial 
collisions. 
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Definition 4. Let S andT be disjoint subsets of[l,r\. A sequence of indices 


P — {k,ji)ieT 


is called a partial collision at T compatible with inactive indices <S if 

1. 1 < ii,ji < q are all distinct, 

2. (I, x il ), (Z,a>«) G Q and f t (x *>) = /,(»*) for l G T, 

3. {Ai)i e suT i- s valid where Ai = 0 for l G S and Ai = x H ® x for l G T ■ 

Note that even in case of SUT = [1, r], a partial collision need not correspond to 
an actual collision as ( x ll )ieT and {x il )i e r might not be valid. A partial collision 
also has the following property. 

Property 3. For disjoint subsets S and T C [1, r], the number of partial colli- 
sions at T compatible with inactive indices S is a multiple of 2^. 

Proof. From a single partial collision P = (ii,ji)i el -, we can obtain different 
partial collisions by swapping ii and ji for each l G T. Since we can define an 
equivalence relation between them, the total number of partial collisions is given 
as a multiple of 2 ^ 1 . □ 

By the following lemma, we can upper bound the number of partial collisions at 
T compatible with inactive indices S for a fixed subset T such that S fl T = 0 
and |<S| + |T| > r — d+1. The proof, given in Appendix E] in detail, is essentially 
based on the application of Markov’s inequality. 

Lemma 1. Let S and T be disjoint subsets of [l,r] such that |*S| < r — d and 
|S| + |T| > r — d + 1, and let |«S| = s and |T| = t. Then for M > 0, the 
number of partial collisions at T compatible with inactive indices S is smaller 
than 2 t ~ r+d+s ~ 1 M except with probability 



4.2 Upper Bounding Pr [Cols] 

According to the number of inactive indices, s = |S|, we distinguish two cases. 

Case 1. s < d — 1 : This case is analyzed by a wish list argument. 

Note that |[1, r]\«S| > r—d+1. For a subset T C [1, r]\S such that |T| = r—d+1, 
we define predicate Cols,r where Cols. 7 - is true if A obtains an index sequence 
of a collision C = (ii,ji)ie[i,r\ such that 

1. ii = ji if and only if l € S, 

2. ma xi er {k,jt} < min ie[lir] . X( sur){max{q,j'i}}. 
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Thus T specifies the indices where the first r — d + 1 active inner collisions are 
completed. For M > 0, we define predicate Bad(M) where Bad(M) is true if 
there exists a subset T C [l,r]\«S of size r — d + 1 such that the number of 
partial collisions at T compatible with inactive indices S is greater than 


Then by Lemma Q (with t = r — d + 1) and a union bound, we have 
r-d+ 1 \ q 2 (r-d+i)-s 


Pr[Bad(M)] < 




\r-d + lj\ s J MN r ~ d - 
In order to upper bound Pr [Cols], we will use the following implication. 
Cols =► Bad(M) V \/ (-iBad(M) A Col s ,r) • 


Tc[l,r]\S 

|7*|=r-d+l 


( 7 ) 

( 8 ) 


Now we will focus on upper bounding Pr [-iBad(M) A Cols , 7 -] for fixed subsets 
S and T. In order for A to set Cols.r to true, A has to first complete a partial 
collision at T compatible with inactive indices S. At the point when a partial 
collision P = ( k,ji)ieT is completed, the remaining queries (xi, a;;)*e[i,r]\(suT) 
that could make a collision along with P are uniquely determined. (They exist 
only if (x H )i e j- and 7 - are valid.) If 

1. xi ^ x[ for l e [l,r]\(5 U T), 

2. any of collisions of fi(xi) and fi (x' l ) has not been determined for l G 

M\(SU T), 

then we include (xi,Xi)it£[i t r]\(suT) i n to a wish list C, expecting all of the colli- 
sions to happen sometime later. A single query might include a multiple number 
of wishes into C by completing a multiple number of partial collisions. However 
a single partial collision is associated with a unique element in C. Therefore 
without the occurrence of Bad(M), the size of C is at most L, and we have the 
following upper bound. 


Pr [-iBad(M) A Cols^] < ^ Pr [the i-th wish is granted] . (9) 


Pr [the i - th wish is granted] < jy|[1[t . ]X(Sur)| = 
for each * = and by Q, (0), we have 


p «T r -i + i)( » j 5 


( 10 > 
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Case 2. s > d— 1 : This case might occur when d— 1 < r — d. Let T = [l,r]\«S. 
In this case, Cols implies that there is a partial collision at T compatible with 
inactive indices S. Here we can use Lemma Q] with M = 1 and t = r — s 
since the number of partial collisions should be a multiple of 2^ = 2 r ~ s but 
2 t_ r+d+s— 1 (_ 2 d_1 ) is smaller than 2 r ~ s . Therefore we have 

Pr[Cols] < Pr[there is a partial collisions at T compatible with inactive indices <S] 



4.3 Putting the Pieces Together 

By ©7 H I I and (I I 1 1) . we obtain the following result. 


Pr[Col] < Y, (’ 

r-d 

+ £ 




-d+l\ q^r-d+D-s 


s ) M(s)N r ~ d+1 


2 s M(s) \ 
N d ~ s ~ 1 J 


where the parameter M (s) might depend on the size of S and the second term 
of the right hand side appears only when d — 1 < r — d. In order to optimize the 
right hand side of the inequality, set 


M{s) = 



by solving 


-d+l\ q^r-d+D-s 


V s J M(s)N r ~ d+1 

e have the following theorem. 


2 s M(s) 


Theorem 3. Let H be the Knudsen-Preneel compression function based o 
[■ r , k, d\ 2 ‘ code. Then we have 




d+iy 2 i+Y-d+t-j 
* ) 


q 2(r-s)-d+l 
}fr- a 


Interpretation. Let d — 1 < r — d or equivalently 2d < r + 1. Assuming 
< q < N, we have 


Ei 


r-s \fr-d+iy 2f+V~ d + 1 -f _ q ( q r ~^ +2 \ 

-d+tyV s / yjV3“? +1 y ’ 
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and 




= o 


v N r ~ d+1 J ' 

In this case, H is collision resistant up to N 2r ~ 3d + 3 query complexity since 
Aft < N < N 2 r r-iali < iV. 

Let 2d > r + 1. Assuming q > N, we have 

* 2i +1 q r ~ 


El 


^rry 




= o 


ATS 


In this case, H is collision resistant up to N 2r ~ 2d + 2 query complexity since 
N < N 2r-2d+2 _ 


We summarize this result as follows. 


Corollary 1. Let H be the Knudsen-Preneel compression function based on an 
[r,k,d] 2 e code. 

(a) If2d<r + 1, then H is collision resistant up to N ^-sd+3 query complexity. 

(b) If2d>r + \, then H is collision resistant up to N 2 r - 2d + 2 query complexity. 


Corollary 2. Let H be the Knudsen-Preneel compression function based on an 
[ r , k, d\ 2 e MDS code. 

(a) If r + 1 < 2k, then H is collision resistant up to N 3k ~ r query complexity. 

(b) If r + 1 > 2k, then H is collision resistant up to query complexity. 


Example 5. Let H be based on [5, 3, 3]4 MDS code. Then 


_ 20<f 40^gi 30g 4 

~ jtff + N 2 + AT 3 ' 

Therefore H is collision resistant up to N 3 / 4 query complexity. 


'3W 

f2J N 3 
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A Proof of Lemma 

Let T = WUV bea disjoint decomposition of T such that |<S| + \U\ = r — d + 1. 
Let D[W, V] be the set of index sequences 

D = (( ii,ji)ieu , {hi)iev) 


such that 

1. 1 < ii,ji,hi < q are all distinct, 

2. maxi eu {ii,jt} <mmi eV {hi}. 

For a sequence D = (( ii,ji)ieu , (hi)iev) 6 B[W, V], we define a random variable 
Xjj where Xp = 1 if there is a sequence ( ii,ji) ieV such that 

1. ma x{ii,ji} = hi for l G V, 

2. P = {k, ji)i e u\jv * s a P ar ti a l collision at T compatible with inactive indices 
S , 

and Xn = 0 otherwise. The condition 

rnax{b, < mi n{/q} = mm{ma x{**, jj}} 

implies that the inner collisions at V are completed after the inner collisions at U. 
Therefore for D = , (hi)iev) € ID* [ZY , V] , Prpfo = 1] is the probability 

that 

1. For leU, fax*) = 

2. For l G V, fi(x h ‘) = fi(x h ‘ © A*), where 

(a) (A*)i £ [i )T .] is a unique extension of (A)zesuw, where At = 0 for l G S 
and Ai = x 11 ® x' Jl for l G U (by Property |2J) , 

(b) fi(x hl ® A*) has been queried before the hi - th query. 

Since t inner collisions are necessary for Xu = 1, we have 
Pr[X D = 1] = Ex(X D ) <±- t B 
Let __ 

E E 

UUV=T DeDW.V] 
wnv=0 

pi|+|M|=r-d+l 

5 If the extension (zA*); e [ l r ] does not exist, then Pr[Xo = 1] = 0. 
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Since the number of possible decompositions of T — U. U V such that U fl V = 0 
and |<S| + \U\ = r — d + 1 is ( r _ d l s+1 ) and 

|B[W,V]| < q m+W\ = q \r\+\i4\ = qt+ (r-d+l)-s 

for each decomposition, we have 


B<X) = ( r _ /_ s + J q t+r ~ d ~ s+1 Ex(X D ) < 



Using Markov’s inequality, for M > 0 we have 

P'[^*<ls( r ./, +1 )^ ( 12 > 

Let P = ( ii,ji) ie7 - be a partial collision at T compatible with inactive indices 
S. Then we always have a unique disjoint decomposition of T = U U V such that 
\U\=r — d— s + 1 and 


max{p , ji } < mm {max{i , 

In this case, we have X D = 1 for D = ((ii,ji)i e u,(hi) leV ) where hi = 
ma x{ii,ji}. If we regard this association of P with D as a mapping, then exactly 
2 |V|(_ 2 *-(r-<i-s+i)) different partial collisions would be mapped to the same 
sequence D since can be replaced by for each index l e V without 

changing the image of this mapping. Therefore the inequality m implies that 
the number of partial collisions at T compatible with inactive indices S is at 
most 2 t- r +d+s-iM except with probability ( r _ d l s+1 )q ,t+r-d-,s+1 / (MN*). 
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Abstract. The idea of double block length hashing is to construct a 
compression function on 2 n bits using a block cipher with an n-bit block 
size. All optimally secure double length hash functions known in the 
literature employ a cipher with a key space of double block size, 2n-bit. 

On the other hand, no optimally secure compression functions built from 
a cipher with an n-bit key space are known. Our work deals with this 
problem. Firstly, we prove that for a wide class of compression functions 
with two calls to its underlying n-bit keyed block cipher collisions can 
be found in about 2"^ 2 queries. This attack applies, among others, to 
functions where the output is derived from the block cipher outputs in 
a linear way. This observation demonstrates that all security results of 
designs using a cipher with 2n-bit key space crucially rely on the presence 
of these extra n key bits. The main contribution of this work is a proof 
that this issue can be resolved by allowing the compression function to 
make one extra call to the cipher. We propose a family of compression 
functions making three block cipher calls that asymptotically achieves 
optimal collision resistance up to 2 n ( 1_ D queries and preimage resistance 
up to 2 3 "( 1 ~ e )/ 2 queries, for any e > 0. To our knowledge, this is the first 
optimally collision secure double block length construction using a block 
cipher with single length key space. 

1 Introduction 

Double (block) length hashing is a well-established method for constructing a 
compression function with 2n-bit output based only on n-bit block ciphers. The 
idea of double length hashing dates back to the work of Meyer and Schilling 
with the introduction of the MDC-2 and MDC-4 compression functions 
in 1988. In recent years, the design methodology got renewed attention in the 
works of [28417191 1 ( )1 1 21 1 d!2 H2 7 j . Double length hash functions have an obvious 
advantage over classical block cipher based functions such as Davies-Meyer and 
Matyas-Meyer-Oseas [2 212b j : the same type of underlying primitive allows for a 
larger compression function. Yet, for double length compression functions it is 
harder to achieve optimal n-bit collision and 2n-bit preimage security. 

We focus on the simplest and most-studied type of compression functions, 
namely functions that compress 3n to 2 n bits. Those can be classified into two 
classes: compression functions that internally evaluate a 2n-bit keyed block ci- 
pher E : {0, l} 2 " x {0, 1}” — > {0, 1}" (which we will call the DBL 2 ” class), and 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 526- 15151 2012. 
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ones that employ an n-bit keyed block cipher E : {0, 1}" X {0, 1}" — > {0, 1}" 
(the DBL" class). The DBL 1 2 " class is well understood. It includes the classical 
compression functions Tandem-DM and Abreast-DM jH| and Hirose’s function 
0, as well as Stam’s supercharged single call Type-I compression function de- 
sign j25!2fi| (reconsidered in fTIj l and the generalized designs by Hirose |£j and 
Ozen and Stam [21] . As illustrated in Table [I] all of these functions provide op- 
timal collision security guarantees (up to about 2" queries), and Tandem-DM, 
Abreast-DM, and Hirose’s function are additionally proven optimally preimage 
resistant (up to about 2 2 " queries). These bounds also hold in the iteration, 
when a proper domain extender is applied |T[ . Lucks [15! introduced a com- 
pression function that allows for collisions in about 2"/ 2 queries, but achieves 
optimal collision resistance in the iteration. Members of the DBL" class are 
the MDC-2 and MDC-4 compression functions ca, the MJH construction m, 
and a construction by Jetchev et al. [21 • For the MDC-2 and MJH compression 
functions ^collisions and preimages can be found in about 2"/ 2 and 2" queries, re- 
spectiveljQ. The MDC-4 compression function achieves a higher level of collision 
and preimage resistance than MDC-2 m, but contrary to the other functions 
it makes four block cipher calls. Jetchev et al.’s construction makes two block 
cipher calls and achieves 2 2 "/ 3 collision security. Stam also introduced a design 
based on two calls, and proved it optimally collision secure in a restricted se- 
curity model where the adversary must fix its queries in advance. Therefore we 
did not include this design in the table. Further related results include the work 
of Nandi et al. j2D[, who presented a 3n-to-2n-bit compression function making 
three calls to a 2«-to-n-bit one-way function, achieving collision security up to 
2 2 "/ 3 queries. They extended this result to a 4n-to-2n-bit function using three 
2n-bit keyed block ciphers. 

Unlike the DBL 2 " class, for the DBL" class no optimally secure compression 
function is known. The situation is the same for the iteration, where none of 
these designs has been proven to achieve optimal security. Determinative to this 
gap is the difference in the underlying primitive: in the DBL 2 " class, the under- 
lying primitive maps 3n bits to n bits and thus allows for more compression. In 
particular, if we consider Tandem-DM, Abreast-DM, and Hirose’s function, the 
first cipher call already compresses the entire input to the compression function, 
and the second cipher call is simply used to assure a 2n-bit output. In fact, these 
designs achieve their level of security merely due to this property, for their proofs 
crucially rely on this (see also Sect. E[). 

Thus, from a theoretical point of view it is unreasonable to compare DBL 2 " 
and DBL". But the gap between the two classes leaves us with an interesting 
open problem: starting from a single block cipher E : {0, 1}" x {0, 1}" — > {0, 1}", 
is it possible to construct a double length compression function that achieves 
optimal collision and preimage security? This is the central research question 
of this work. Note that Stam’s bound E3 does not help us here: it claims that 
collisions can be found in at most (2 7l )( 2r ~ 1 )/( r + 1 ) queries, where r denotes the 


1 In the iteration collision resistance is proven up to 2 3 '"? 8 queries for MDC-2 m and 

2 2n E queries for MJH ^01 ■ 
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Table 1. Asymptotic ideal cipher model security guarantees of known double length 
compression functions in the classes DBL 2 ” (first) and DBL n (next). A more detailed 
comparison of some of these functions can be found in 0 App. A]. 


compression 

E-calls 

collision 

preimage 

underlying 

function 

security 

security 

cipher 

Lucks’ 

1 

2 n /2 

2" 


Stam’s 

1 

2 n ES| 

2 n ESI 


Tandem-DM 

2 

2” in 

2 2n [2TH1 

* 

Abreast-DM 

Hirose’s 

2 

2 

2 n gE! 

2” » 

2 2n [2IT3| 
2 2n liTHI 

Hirose-class 

2 

2 n 0 

2" 0 


Ozen-Stam-class 

2 

2 n EU 

2 n EH 


MDC-2 

2 

2 n /2 

2" 
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4 
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3 

2” 
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number of block cipher calls, which results in the trivial bound for r > 2. For 
r > 2, denote by F r : {0, l} 3 " — ► {0, l} 2 " a compression function that makes r 
calls to its primitive E. 

As a first contribution, we consider F 2 , and prove that for a very large class 
of functions of this form one expects collisions in approximately 2”/ 2 queries. 
Covered by the attack are among others designs with linear finalization function 
(the function that produces the 2n-bit output given the 3n-bit input and the 
block cipher responses). We note that the compression function by Jetchev et 
al. P is not vulnerable to the attack due to its non-linear finalization function. 
Nevertheless, these results strengthen the claim that no practical optimally col- 
lision secure F 2 function exists. Motivated by this, we increase the number of 
calls to F, and consider F 3 . In this setting, we derive a family of compression 
functions which we prove asymptotically optimal collision resistant up to 
queries and preimage resistant up to 2 3 "( 1_£ )/ 2 queries, for any e > 0. Our com- 
pression function family, thus, achieves the same level of collision security as the 
well-established Tandem-DM, Abreast-DM, and Hirose’s function, albeit based 
on a much weaker assumption. In the DBL” class, our design clearly compares 
favorably to MDC-4 that makes four block cipher evaluations, and from a prov- 
able security point of view it beats MDC-2 and MJH, still, an extra E evaluation 
has to be made which results in an efficiency loss. The introduced class of com- 
pression functions is simple and easy to understand: they are defined by 4 x 4 
matrices over the field GF( 2”) which are required to comply with easily satisfied 
conditions. Two example compression functions in this class are given in Fig. EJ 

The security proofs of our compression function family rely on basic principles 
from previous proofs, but in order to accomplish optimal collision security (and 
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Fig. 1. Two example compression functions from the family of functions introduced 
and evaluated in this work. For these constructions, all wires carry n = 128 bits, and 
the arithmetic is done over GF(2 128 ). We further elaborate on these designs and their 
derivations in Sect. 0] 

as our designs use n-bit keyed block ciphers) our proofs have become significantly 
more complex. The security proofs of all known DBL 2n functions (see Table GJ 
crucially rely on the property that one block cipher evaluation defines the input 
to the second one. For F 3 this cannot be achieved as each primitive call fixes 
at most 2 n bits of the function input. Although one may expect this to cause 
an optimal proof to become unlikely, this is not the case. Using a new proof 
approach — we smartly apply the methodology of “wish lists” (by Armknecht et 
al. and Lee et al. [211 to collision resistance — we manage to achieve asymptoti- 
cally the close to 2" collision security for our family of functions. Nonetheless, the 
bound on preimage resistance does not reach the optimal level of 2 2 ” queries. One 
can see this as the price we pay for using single key length rather than double key 
length block ciphers: a straightforward generalization of the pigeonhole-birthday 
attack of Rogaway and Steinberger |21 shows that, when the compression func- 
tion behaves “sufficiently random” , one may expect a preimage in approximately 
2 5n / 3 queries (cf. Sect . 13) . The asymptotic preimage bound of 2 3 "/ 2 found in this 
work closely approaches this generic bound. 

Outline. We present and formalize the security model in Sect. |2I Then, in Sect. 03 
we derive our impossibility result on F 2 . We propose and analyze our family of 
compression functions in Sects. 0 and 0 This work is concluded in Sect. El 

2 Security Model 

For n > 1, we denote by Bloc(n) the set of all block ciphers with a key and 
message space of n bits. Let E e Bloc(n). For r > 1, let F r : {0, l} 3 " ->■ {0, l} 2 " 
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be a double length compression function making r calls to its block cipher E. 
We can represent F r by mappings /,; : {0, l}b+ 2 )" — {0, l} 2 " for i 1, . . . , r + 1 
as follows: 


F r (u, v, w ) 

for * = 1, . . . ,r: 

Ci <- E(ki,mi) , 

return (y, z) <— f r+ i(u, v, w\ Ci, . . . , c r ) . 

For r = 3, the F r compression function design is depicted in Fig. [21 This generic 
design is a generalization of the permutation based hash function construction 
described by Rogaway and Steinberger M- In fact, it is straightforward to gen- 
eralize the main findings of |21 to our F r design and we state them as prelim- 
inary results. If the collision- and preimage-degeneracies are sufficiently small 
(these values intuitively capture the degree of non-randomness of the design 
with respect to the occurrence of collisions and preimages), one can expect col- 
lisions after approximately 2 n ( 2_2 / r ) queries and preimages after approximately 
2 »i( 2 -i/r) q Uer i es _ -yy e re f er m for the details. First of all, these findings 
confirm that at least two cipher calls are required to get 2" collision resistance. 
More importantly, from these results we can conclude that F r can impossibly 
achieve optimal 2 2 " preimage resistance. Yet, it may still be possible to con- 
struct a function that achieves optimal collision resistance and almost-optimal 
preimage resistance. 

Throughout, we consider security in the ideal cipher model: we consider an 
adversary A that is a probabilistic algorithm with oracle access to a block cipher 
E «— Bloc(n) randomly sampled from Bloc(n). A is information-theoretic: it 
has unbounded computational power, and its complexity is measured by the 
number of queries made to its oracles. The adversary can make forward queries 
and inverse queries to E, and these are stored in a query history Q as indexed 
tuples of the form (fcj,mi,Cj), where ki denotes the key input, and (m,;, c,;) the 
plaintext/ciphertext pair. For q > 0, by Q q we define the query history after q 
queries. We assume that the adversary never makes queries to which it knows 
the answer in advance. 



Fig. 2. F 3 : {0, l} 3n — > {0, l} 2n making three block cipher evaluations 
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A collision-finding adversary A for F r aims at finding two distinct inputs 
to F r that compress to the same range value. In more detail, we say that A 
succeeds if it finds two distinct tuples ( u , v, w ), ( u ' , i/, w') such that F r (u, v, w) = 
F r (u' , v',v/) and Q contains all queries required for these evaluations of F r . We 
define by 


adv“ u (A) = Pr ^ 


Bloc(n), (u,v,w),(u',v',w')<-A E ’ E 1 : 

( u , v, w) ^ ( u ', v', w') A F r (u, v, w) = F r (u',v', w') 


the probability that A succeeds in this. By adv™ r n (q) we define the maximum 
collision advantage taken over all adversaries making q queries. 

For preimage resistance, we focus on everywhere preimage resistance E3I, 
which captures preimage security for every point of (0, l} 2 ". Before making any 
queries to its oracle, a preimage-finding adversary A first decides on a range 
point ( y,z ) G {0, l} 2 ". Then, we say that A succeeds in finding a preimage if it 
obtains a tuple (it, v, w) such that F r (u, v, w) = ( y , z) and Q contains all queries 
required for this evaluation of F r . We define by 

Bloc(n), (it, v, w) <- A e ’ e ~ 1 (y, z) 

Fr(u,v,w) = (y,z) ) 


the probability that A succeeds, maximized over all possible choices for (y,z). 
By adv^. re (g) we define the maximum (everywhere) preimage advantage taken 
over all adversaries making q queries. 


3 Impossibility Result for 2-Call Double Length Hashing 

We present an attack on a wide class of double block length compression func- 
tions with two calls to their underlying block cipher E : {0, 1}” x {0, 1}" — > 
{0, 1}". Let F 2 be a compression function of this form. We pose a condition 
on the finalization function fy, such that if this condition is satisfied, collisions 
for F 2 can be found in about 2"/ 2 queries. Although we are not considering all 
possible compression functions, we cover the most interesting and intuitive ones, 
such as compression functions with linear finalization function f%. Compression 
functions with non-linear fy are covered up to some degree (but we note that 
the attack does not apply to the compression function of [jJ , for which collision 
security up to 2 2 "/ 3 queries is proven). 

We first state the attack. Then, by ways of examples, we illustrate its gen- 
erality. For the purpose of the attack, we introduce the function left n which on 
input of a bit string of length 2 n bits outputs the leftmost n bits. 

Proposition 1. Let F 2 : {0, l} 3 " — >• (0, l} 2 " be a compression function as 
described in Sect. OJ Suppose there exists a bijective function L such that for 
any u,v,w,ci,C 2 G {0, 1}" we have 

left„ o L o Mu, v, w, ci, c 2 ) = left,, o L o f 3 (u, v, w, c x , 0) . (1) 

Then, one can expect collisions for F 2 after 2”/ 2 queries. 
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Proof. Let F 2 be a compression function and let L be a bijection such that (QJ) 
holds. First, we consider the case of L being the identity function, and next we 
show how this attack extends to the case L is an arbitrary bijection. 

Suppose (P) holds with L the identity function. This means that the first n 
bits of fs(u, v, w; Ci, C2) do not depend on C2 and we can write fy as a concate- 
nation of two functions gi : {0, l} 4 " — > {0, 1}" and g 2 : {0, l} 5 ” -f {0, 1}" as 
h{u,v,w,ci,c 2 ) = gi(u,v,w;ci)\\g 2 (u,v,w,ci,c 2 ). Let a G N. We present an 
adversary A for F 2 . The first part of the attack is derived from |24j . 

• Make a queries (fci,mi) — > ci that maximize the number of tuples ( u,v,w ) 
with fi(u,v,w) hitting any of these values (ki,mi). By the balls-and-bins 
principle^, the adversary obtains at least a- 2 3 "/2 2 " = a2 n tuples (u, v, w; ci) 
for which it knows the first block cipher evaluation; 

• Again by the balls-and-bins principle, there exists a value y such that at 
least a tuples satisfy gi(u, v,w;cy) = y\ 

• Varying over these a tuples, compute ( k 2 ,m 2 ) = f 2 (u,v,w;cy) and query 

(k 2 , m 2 ) to the cipher to obtain a c 2 . A finds a collision for F 2 if it obtains 
two tuples (u, v,w;ci,c 2 ), (u',v',w';d 1 ,d 2 ) that satisfy g 2 (u,v,w,cy,c 2 ) = 
g 2 (u\ v’, w 1 ; , d 2 ). 

In the last round one expects to find a collision if a 2 /2 n = 1, or equivalently if 
a = 2 n / 2 . In total, the attack is done in approximately 2 ■ 2”/ 2 queries. 

It remains to consider the case of L being an arbitrary bijection. Define F as 
F 2 with /a replaced by fy = Lo f 3 . Using the idea of equivalence classes on com- 
pression functions m we prove that F 2 and F are equally secure with respect 
to collisions. Let A be a collision finding adversary for F . We construct a colli- 
sion finding adversary A for F 2 , with oracle access to E, that uses A to output 
a collision for F 2 . Adversary A proceeds as follows. It forwards all queries made 
by A to its own oracle. Eventually, A outputs two tuples (u,v,w), (u',v',w') 
such that F ( u,v,w ) = F (u 1 . v' , w'). Denote by ci the block cipher outcome 
on input of fi(u, v, w) and by c 2 the outcome on input of f 2 (u, v, w; ci). Define 
and d 2 similarly. By construction, as (u, v, w) and (V, v', w') form a collision 
for F , we have L o fo{u, v, w ; ci, c 2 ) = L o fz(u', v' , w'\ dy, d 2 ). Now, bijectivity 
of L implies that /sfy, v, w, cy, c 2 ) = fsfu', v', w ' ; d y , c^), and hence (u, v, w) and 
{ u',v',w ') form a collision for F 2 . (Recall that F 2 and F only differ in the 
finalization function fy, the functions /j and f 2 are the same.) We thus obtain 
adv^2 U (g) < adv“ 2 u fy). The derivation in reverse order is the same by symmetry. 
But F 2 satisfies UJ) for L the identity function. Therefore, the attack described 
in the first part of the proof applies to F , and thus to F 2 . □ 

We demonstrate the impact of the attack by giving several example functions 
that fall in the categorization. We stress that the requirement of Prop. Q] is in 
fact solely a requirement on / 3 ; f\ and f 2 can be any function. 

2 If k balls are thrown in l bins, the a fullest bins in total contain at least ak/l balls. 
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Suppose F 2 uses a linear finalization function fe. Say, /3 is defined as follows: 


( an ai2 ai3 ai4 ais \ , 
a2i a22 a23 a24 a25 / 


w,ci,c 2 ) t = ( y,z ) T , 


where addition and multiplication is done over the field GF( 2"). Now, if a 25 = 0 
we set L = which corresponds to swapping y and z. If a 25 / 0, we set L = 
(o - a T 25 ) • which corresponds to subtracting the second equation aisa^ 1 times 
from the first one. The attack also covers designs whose finalization function fs 
rotates or shuffles its inputs, such as MDC-2, where one defines L so that the 
rotation gets undone. We elaborate on this in the full version In general, if fs 
is a sufficiently simple add-rotate-xor function, it is possible to derive a bijective 
L that makes m satisfied. Up to a degree, the attack also covers general non- 
linear finalization functions. However, it clearly does not cover all functions and 
it remains an open problem to either close this gap or to come with a (possibly 
impractical) F 2 compression function that provable achieves optimal collision 
resistance. One direction may be to start from the compression function with 
non-linear finalization fy by Jetchev et al. [Zj, for which collision resistance up 
to 2 2 "/ 3 queries is proven. 


4 Double Length Hashing with 3 .E-calls 

Motivated by the negative result of Sect. 0 we target the existence of double 
length hashing with three block cipher calls. We introduce a family of double 
length compression functions making three cipher calls that achieve asymptoti- 
cally optimal 2” collision resistance and preimage resistance significantly beyond 
the birthday bound (up to 2 3 "/ 2 queries). We note that, although the preimage 
bound is non-optimal, it closely approaches the generic bound dictated by the 
pigeonhole-birthday attack (Sect. EI> • 

Let GF( 2”) be the field of order 2". We identify bit strings from {0, 1}" and 
finite field elements in GF(2 n ) to define addition and scalar multiplication over 
{0, 1}". In the family of double block length functions we propose in this section, 
the functions fi, f 2 , / 3 , /4 of Fig. |2I will be linear functions over GF{ 2"). For two 
tuples x = (xi,. . . ,xi) and y = {yi, - ■ ■ ,yi) of elements from {0, 1}", we define 
by x-y their inner product Yli=i x iVi e {Q> 1}"- 

Before introducing the design, we first explain the fundamental consideration 
upon which the family is based. The security proofs of all DBL 2 " functions 
known in the literature (cf. Table 0) crucially rely on the property that one 
block cipher evaluation defines the input to the other one. For DBL 2 " functions 
this can easily be achieved: any block cipher evaluation can take as input the 
full 3n-bit input state (u,v,w). Considering the class of functions DBL", and 
F r of Fig. □ in particular, this can impossibly be achieved: one block cipher 
“processes” at most 2 n out of 3 n input bits. In our design, we slightly relax this 
requirement, by requiring that any two block cipher evaluations define the input 
to the third one. Although from a technical point of view one may expect that 
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F^(u,v,w) = ( y,z ), where: 
ci «- E{u,v ) , 


A 


fc 2 <- a x •(«,», ci) , 
to 2 <— a 2 • (u, u, ci, w) , 
y <- E(k2,m2) + m 2 , 



y 


m 3 <— a4-(u,u,ci,u;) , 
z<- E(k 3 ,m 3 )+m 3 . 


k 3 a 3 -(u,v,ci ) , 


Fig. 3. The family of compression functions where A is a 4 X 4 matrix as specified 
in the text. Arithmetics is done over GF(2 n ). 

this change causes optimal collision resistance to be harder or even impossible 
to be achieved, we will demonstrate that this is not the case due to new proof 
techniques employed to analyze the collision resistance. 

Based on this key observation we propose the compression function design 
of Fig. [3 Here, 



an ai 2 ai3 0 

a2i a 22 a 2 3 a 2 4 

S31 332 a33 0 
a4i a4 2 a 43 a44 


( 2 ) 


is a 4x4 matrix over GF{ 2”). Note that, provided A is invertible and a 2 4 , a 44 ^ 0, 
any two block cipher evaluations of define (the inputs of) the third one. 
For instance, evaluations of the second and third block cipher fix the vector 
A(u, v, ci,tu) T , which by invertibility of A fixes (u, v,ci,w) and thus the first 
block cipher evaluation. Evaluations of the first and second block cipher fix the 
inputs of the third block cipher as a 2 4 ^ 0. For the proofs of collision and 
preimage resistance, however, we will need to posit additional requirements on 
A. As we will explain, these requirements are easily satisfied. 

In the remainder of this section, we state our results on the collision resistance 
of F% in Sect,. 14.11 and on the preimage resistance in Sect. 14.21 

4.1 Collision Resistance of F% 

We prove that, provided its underlying matrix A satisfies some simple conditions, 
F^ satisfies optimal collision resistance. In more detail, we pose the following 
requirements on A: 
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• A is invertible; 

• ai 2 , ai3, a24, a32, a33, a44 7 ^ 0 ; 

• ai 2 7 ^ a32 and ai 3 ^ 333 . 

We refer to the logical AND of these requirements as colreq. 


Theorem 1. Let n 6 {0, 1}". Suppose A satisfies colreq. Then, for any positive 
integral values ti,t 2 , 


ad Vps l (q) < 


2t^q + 3 ^ 2 ? ■+■ 11 q + 3£i £ 2 "h 7 £it 2 

2^~q + 


ti(2 n -q) 


+ 3-2” 


(«r 


( 3 ) 


The proof is given in Sect.0 The basic proof idea is similar to existing proofs in the 
literature (e.g. jl(il27j ) and is based on the usage of thresholds £i , £ 2 . For increasing 
values of £ 1 , £2 the first term of the bound increases, while the second two terms 
decrease. Although the proof derives basic proof principles from literature, for the 
technical part we deviate from existing proof techniques in order to get a bound 
that is “as tight as possible” . In particular, we introduce the usage of wish lists in 
the context of collisions, an approach that allows for significantly better bounds. 
Wish lists have been introduced by Armknecht et al. Pj and Lee et al. j 1 I I I 3j for the 
preimage resistance analysis of DBL 2 ” functions, but they have never been used 
for collision resistance as there never was a need to do so. Our analysis relies on 
this proof methodology, but as for collisions more block cipher evaluations are in- 
volved (one collision needs six block cipher calls while a preimage requires three) 
this makes the analysis more technical and delicate. 

The goal now is to find a good threshold between the first term and the latter 
two terms of Q. To this end, let e > 0 be any parameter. We put t-\ = q and 
t 2 = 2 ne (we can assume £2 to be integral). Then, the bound simplifies to 


adv)):] 1 (q) < 


5 ■ 2 2ne q + 10 • 2 nE q + 11 q 
2 n ~q H 


\2 ne (2 n — q) J 


From this, we find that for any e > 0 we have adv™ 3 U (2"/2 3 " £ ) — > 0 for n — >■ 00 . 
Hence, the F'l compression function achieves close to optimal 2" collision security 
for n -> 00 . For n = 128, we evaluate the bound in more detail in fT7j . The 
advantage hits 1/2 for log 2 q ~ 118.3, relatively close to the threshold 127.5 for 
q(q + l)/2 2n . For larger values of n this gap approaches 0. 


4.2 Preimage Resistance of 

In this section we consider the preimage resistance of F|. Though we do not 
obtain optimal preimage resistance — which is impossible to achieve after all, due 
to the generic bounds of the pigeonhole-birthday attack (Sect. E3) — we achieve 
preimage resistance up to 2 3 "/ 2 queries, much better than the preimage bounds 
on MDC-2 and MDC-4 US), relatively close to the generic bound. Yet, for the 
proof to hold we need to put slightly stronger requirements on A. 
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•A- q g is invertible for any Bi,B 2 e {(oo), (J°), (J ?)}. In the 

y Ba 0 0 ) 

remainder, we write [Bi/B 2 ] to denote the subtracted matrix; 

• ai2, ai3, a 2 4, a32, a33, a44 7^ 0; 

• ai 2 7 ^ a32, ai 3 7 ^ 833 , and a24 7 ^ a 44- 

We refer to the logical AND of these requirements as prereq. We remark that 
prereq =>• colreq, and that matrices satisfying prereq are easily found. Simple 
matrices complying with these conditions over the field GF( 2 128 ) are 


( 0 1 2 0 \ 
1001 
02 10 
00 02 / 


0 110 
110 1 
02 3 0 
10 2 2 


( 4 ) 


These are the matrices corresponding to the compression functions of Fig. 0 
Here, we use a ; 128 + a ; 127 + a ; 126 + a ; 121 + 1 as our irreducible polynomial and we 
represent bit strings as polynomials in the obvious way (1 = 1, 2 = x, 3 = 1 + a:). 
Note that the choice of matrix A influences the efficiency of the construction. 
The first matrix of 0) has as minimal zeroes as possible, which reduces the 
amount of computation. 

Theorem 2. Let n 6 {0, 1}". Suppose A satisfies prereq. Then, for any positive 
integral value t, provided t < q, 


adv/r^) < - 


-(SO 



( 5 ) 


The proof is given in the full version of this paper m- As for the bound on the 
collision resistance (Thm. 0) , the idea is to make a smart choice of t to minimize 
this bound. Let e > 0 be any parameter. Then, for t = g 1 / 3 , the bound simplifies 
to 


adv^ re ( 9 ) < 


6 g 2 / 3 + IS ? 1 / 3 + 26 
2 ” — 2 




From this, we find that for any e > 0 we have adv^ , 3 re (2 3 "/ 2 /2 nE ) —7 0 for n — > 
00 . Hence, the F/ compression function achieves close to 2 3n / 2 preimage security 
for n -> 00 . For n = 128, we evaluate the bound in more detail in fT7j . The 
advantage hits 1/2 for log 2 q « 180.3, relatively close to the threshold 191.5 for 
q 2 /2 3n . For larger values of n this gap approaches 0. 

The result shows that F/ with A compliant to prereq satisfies preimage re- 
sistance up to about 2 3 "/ 2 queries. We note that our proof is the best possible 
for this design, by demonstrating a preimage-finding adversary that with high 
probability succeeds in at most 0(2 3 "/ 2 ) queries. Let a € N. The adversary 
proceeds as follows. 
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• Make a2 n queries to the block cipher corresponding to the bottom-left 
position of Fig. E3 One expects to find a tuples (k 2 , fn 2 , C 2 ) that satisfy 
m 2 + c 2 = y; 

• Repeat the first step for the bottom-right position. One expects to find a 
tuples (ks, m 3 , C 3 ) satisfying m 3 + C 3 = z; 

• By invertibility of A, any choice of ( k 2 ,m 2 ,c 2 ) and (/C 3 , m 3 , C 3 ) uniquely 
defines a tuple (u, v,ci,w) for the evaluation. Likely, the emerged tuples 
(u,v,ci) are all different, and we find about a 2 such tuples; 

• Varying over all a 2 tuples (u,v,ci), query (u,v) to the block cipher. If it 
responds ci, we have obtained a preimage for F%. 

In the last round one expects to find a preimage if a 2 /2" = 1, or equivalently if 
a = 2 n / 2 . The first and second round both require approximately 2 3 ™/ 2 queries, 
and the fourth round takes 2" queries. In total, the attack is done in approxi- 
mately 2 • 2 3 ”/ 2 + 2" queries. 

5 Proof of Thm. [2 

The proof of collision resistance of follows the basic spirit of |1 fij , but crucially 
differs in the way the probability bounds are computed. A new approach here 
is the usage of wish lists. While the idea of wish lists is not new — it has been 
introduced by Armknecht et al. [2| and Lee et al. jl'IlTIij for double block length 
compression functions, and used by Mennink [2jj for the analysis of MDC-4 — in 
these works wish lists are solely used for the analysis of preimage resistance rather 
than collision resistance. Given that in a collision more block cipher evaluations 
are involved, the analysis becomes more complex. At a high level, wish lists rely 
on the idea that in order to find a collision, the adversary must at some point 
make a query that “completes this collision” together with some other queries 
already in the query history. Wish lists keep track of such query tuples, and the 
adversary’s goal is to ever obtain a query tuple that is in such wish list. A more 
technical treatment can be found in the proof of Lem. [T] 

We consider any adversary that has query access to its oracle E and makes 
q queries stored in a query history Q q . Its goal is to find a collision for F|, in 
which it by definition only succeeds if it obtains a query history Q q that satisfies 
configuration coll(<2 g ) of Fig. El This means, 

adv pf(q) = Pr (coll(Q,)) . (6) 

For the sake of readability of the proof, we label the block cipher positions in 
Fig. 01 as follows. In the left evaluation (on input (u, v, w)), the block ciphers 
are labeled 1 L (the one on input (u, v)), 2 L (the bottom left one), and 3L (the 
bottom right one). The block ciphers for the right evaluation are labeled 
1R, 2 R, 3 R in a similar way. When we say “a query 1L ” , we refer to a query that 
in a collision occurs at position 1 L. 

For the analysis of Pr (coll (fy 9 )) we introduce an auxiliary event aux(Q 9 ). Let 
ti,t 2 > 0 be any integral values. We define a ux(Qg) = auxi(<2 g ) V- • • Vaux 4 (Q 9 ), 
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Fig. 4. Configuration coll (Q). The configuration is satisfied if Q contains six (possibiy 
the same) queries that satisfy this setting. We require (u,v,w) ^ (u',v',w'). 


where 

auxi(<5 q ) : \{(ki,m i ,Ci),(k j ,mj,c j ) e Q q : i =£ j A m i +c i =m j +c j } \>h; 
au x 2 (Q 9 ) : max z6{0il }» | {(fc;, rrii, Cj) e Q q : ai • (fcj, m,, Cj) = z} \ > t 2 ; 
aux 3 (Q g ) : max z6{0i i } n \{{k u m u Ci) e Q q : a 3 -(h, mi,Ci) = z}\ >t 2 ; 
au x 4 (Q g ) : max z£ { 0 ,i}" | {(&*, m*, Cj) e Q q : + Cj = 2i}| > t 2 ■ 

By basic probability theory, we obtain for 0 : 

Pr (coll(Qg)) < Pr (coll (Q^) A ->aux(Qg)) + Pr (aux(Q g )) . (7) 

We start with the analysis of Pr (coll(Q g ) A -iaux(<3 ? )). For obtaining a query 
history that fulfills configuration coll(Q g ), it may be the case that a query ap- 
pears at multiple positions. For instance, the queries at positions 1 L and 2 R are 
the same. We split the analysis of coll (Qq) into essentially all different possible 
cases, but we do this in two steps. In the first step, we distinct among the cases a 
query occurs in both words at the same position. We define for binary au, a 2 , a 3 
by coll 0 , l0 , 20 ,3 (Q) the configuration coll(Q) of Fig. 0 restricted to 

1L = 1R 4=k ai = 1 , 2L = 2R 4=> a 2 = 1 , 3L = 3R <=► a 3 = 1 . 

By construction, coll(Q g ) =k V Ql ,a 2 ,a 3 e{o,i } co ^aia 2 « 3 {Qq): and from 00) we 
obtain the following bound on adv“ 3 u (g): 

adv^ u (g) < ^ Pr (coll aia2CK3 (Q g ) A -iaux(Q 9 )) + Pr (aux(Q g )) . (8) 

aifop } 3 
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Note that we did not make a distinction yet whether or not a query occurs at 
two “different” positions (e.g. at positions 1 L and 2 R). These cases are analyzed 
for each of the sub-configurations separately, as becomes clear later. Probabil- 
ities Pr (coll aia2a , 3 (Q ? ) A -.aux(Q g )) for the different choices of aq , 02 - 0,3 are 
bounded in Lems. m The proofs are rather similar, and we only bound the 
probability on collooo(Qq) in full detail (Lem. |TJ) . A bound on Pr (aux(Q f; )) is 
given in Lem. 0 A part of the proof of Lem. GJ and the proofs of Lems. EE are 
given in 

Lemma 1. Pr (coll 0 oo(Q 9 ) A -.aux(Q,)) < 

Proof. Sub-configuration collooo(<9g) is given in Fig. 0 The block cipher queries 
at positions a and \a are required to be different, and so are the ones are positions 
b, \b and c, !c. 


u v w u' v' w' 



Fig. 5. Configuration collooo(Q)- We require (u, v, w) A (u',v',w'). 


We consider the probability of the adversary finding a solution to configuration 
collooo(<9g) such that Q q satisfies ~>aux(Q q ). Consider the ith query, for i £ 
{1, . . . , q}. We say this query is a winning query if it makes collooo(Qi) A-iaux(Qi) 
satisfied for any set of other queries in the query history Qi-i. We can assume 
the ith query does not make aux(Qj) satisfied: if it would, by definition it cannot 
be a winning query. 

Recall that, although we narrowed down the number of possible positions for 
a winning query to occur (in collooo(Qq) it cannot occur at both 1 L and IB., at 
both 2 L and 2 R, or at both 3L and 3 R), it may still be the case that such a 
query contributes to multiple “different” positions, e.g. 1 L and 2 R. Note that 
by construction, a winning query can contribute to at most three block cipher 
positions of Fig. 0 In total, there are 26 sets of positions at which the winning 


540 B. Mennink 


query can contribute at the same time. Discarding symmetric cases caused by 
swapping ( u , v, w) and (V, v', «/), one identifies the following 13 sets of positions: 

«Si={l£}, S± = {1L,2L}, S r = {lL,2R}, S 10 = {1L,2L,3L} , 

5 2 = {2L}, S 5 = {1L,3L}, S 8 = {1L,3R}, Su = {1L,2L,3R} , 

5 3 = {3L}, S 6 = {2L,3L}, S 9 = {2L,3R} , S 12 = {1L,2R,3L}, 

S 13 = {1L,2R, 3 R}. 

Note that there are many more symmetric cases among these, but we are not 
allowed to discard those as these may result in effectively different collisions. 
For j = 1, . . . , 13 we denote by cqIIqoo:^ (Q) configuration coll 0 oo(Q) with the 
restriction that the winning query must appear at the positions in Sj . By basic 
probability theory, 


13 

Pr (collooo(Qg) A -.aux(Q g )) < Pr (colloooiS,- ( Qq ) A -.au x(Q q )) . (9) 

i= i 

colloooiSi ( Qq)- Rather than considering the success probability of the ith query, 
and then sum over i = 1 , . . . , q (as is done in the analysis of |4lftlbl7l9ll 211 HI2 iKflj , 
hence all collision security proofs of Table EJ , the approach in this proof is to fo- 
cus on “wish lists” . Intuitively, a wish list is a continuously updated sequence of 
query tuples that would make configuration collooo:S., (Qq) satisfied. During the 
attack of the adversary, we maintain an initially empty wish list W Sl . Consider 
configuration collooo(Q) with the query at position S\ = {IT} left out (see [Hj 
for a graphical intuition). If a new query is made, suppose it fits this configura- 
tion for some other queries in the query history (the new query appearing at least 
once), jointly representing queries at positions {2L,3L,1R,2R,3R}. Then the 
corresponding tuple (u, v,ci) is added to Ws, . Note that this tuple is uniquely 
determined by the queries at 2 L and 3 L by invertibility of A, but different com- 
binations of queries may define the same wish. The latter does, however, not 
invalidate the analysis: this is covered by the upper bound on Wsj that will be 
computed later in the proof, and will simply render a slightly worse bound. 

As we have restricted to the case the winning query only occurring at the 
position of <Si , we can assume a query never adds itself to a wish lis10. Clearly, 
in order to find a collision for in this sub-configuration, the adversary needs 
to wish for a query at least once. Suppose the adversary makes a query E(k, to) 
where ( k , to, c) G Ws 1 for some c. We say that ( k , to, c) is wished for, and the 
wish is granted if the query response equals c. As the adversary makes at most q 
queries, such wish is granted with probability at most 1/(2" — q), and the same 
for inverse queries. By construction, each element from Ws! can be wished for 
only once, and we find that the adversary finds a collision with probability at 
nrost^U. 

3 A winning query that would appear at multiple positions is counted in colloooiS, (Qq) 
for some other set Sj. 
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Now, it suffices to upper bound the size of the wish list Ws, after q queries, 
and to this end we bound the number of solutions to configuration collooo:S, (Qq)- 
By -iauxi (Qq), the configuration has at most t\ choices for 2 L, 2 R. For any such 
choice, by -lau x 2 (Q q ) we have at most t 2 choices for 1 R. Any such choice fixes 
w' (as a24 7^ 0), and thus the query at position 3 R, and consequently z. By 
-•aux4 (Q q ), we have at most t 2 choices for 3 L. The queries at positions 2 L and 
3 L uniquely fix (u, v, Ci) by invertibility of A. We find IWsJ < U f 2 , and hence 
in this setting a collision is found with probability at most fif 2 /(2” — q). 

colloooiSj CQq) for 3 — 2, . . . , 13. In [17 , Pr (coll 000:5, (Qq) A _ 'aux((5 g )) is 
bounded by £ii|/(2 n — q) for j = 2,3, q/( 2" — q) for j = 4,5,6, 10, 11, 12, 13, 
£1*2/ (2" - q) for j = 7, 8, and (iii 2 + t 2 q)/(2 n - q) for j = 9. 

The proof is now completed by adding all bounds in accordance with (0 . □ 

Lemma 2. Pr (collioo(Qg) A -.aux(Q g )) < • 

Lemma 3. Pr(coll aia2 „ 3 (Q 9 )A^aux(Q g )) < for a\a.2a.z G 

{ 010 , 001 }. 

Lemma 4. Pr (coll aia2a , 3 ((5 g ) A -:aux((3g)) = 0 when a\+a^ + a^> 2. 

Lemma 5. Pr (aux(Q g )) < t ^- q ) +3-2" (^ t2 ^- q ) ) ■ 

From (0) and the results of Lems. EE we conclude the bound of 0). This com- 
pletes the proof of Thm. EJ 

6 Conclusions 

In the area of double block length hashing, where a 3n-to-2n-bit compression 
function is constructed from n-bit block ciphers, all optimally secure construc- 
tions known in the literature employ a block cipher with 2n-bit key space. We 
have reconsidered the principle of double length hashing, focusing on double 
length hashing from a block cipher with n-bit message and key space. Unlike in 
the DBL 2 " class, we demonstrate that there does not exist any optimally se- 
cure design with reasonably simple finalization function that makes two cipher 
calls. By allowing one extra call, optimal collision resistance can nevertheless be 
achieved, as we have proven by introducing our family of designs F^. 

In our quest for optimal collision secure compression function designs, we had 
to resort to designs with three block cipher calls rather than two, which moreover 
are not parallelizable. This entails an efficiency loss compared to MDC-2, MJH, 
and Jetchev et al.’s construction. On the other hand, our family of functions 
is based on simple arithmetic in the finite field: unlike constructions by Stam 
1251261 . Lee and Steinberger EH: and Jetchev et al. [Zj, our design does not make 
use of full field multiplications. The example matrices A given in 0 are designed 
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to use a minimal amount of non-zero elements. We note that specific choices of 
A may be more suited for this construction to be used in an iterated design. 

This work provides new insights in double length hashing, but also results 
in interesting research questions. Most importantly, is it possible to construct 
other collision secure F 3 constructions (beyond our family of functions F|), that 
achieve optimal 2 5 "/ 3 preimage resistance? Given the negative collision resistance 
result for a wide class of compression functions F 2 , is it possible to achieve 
optimal collision security in the iteration anyhow? This question is beyond the 
scope of this work. On the other hand, in line with ideas of m, is it possible 
to achieve an impossibility result for F 3 restricted to the xor-only design (where 
fi,- ■ - ,f a only xor their parameters)? 
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Abstract. We extend and improve biclique attacks, which were recently 
introduced for the cryptanalysis of block ciphers and hash functions. 
While previous attacks required a primitive to have a key or a message 
schedule, we show how to mount attacks on the primitives with these 
parameters fixed, i.e. on permutations. We introduce the concept of sliced 
bicliques, which is a translation of regular bicliques to the framework with 
permutations. 

The new framework allows to convert preimage attacks into collision 
attacks and derive the first collision attacks on the reduced SHA-3 finalist 
Skein in the hash function setting up to 11 rounds. We also demonstrate 
new preimage attacks on the reduced Skein and the output transforma- 
tion of the reduced Grpstl. Finally, the sophisticated technique of message 
compensation gets a simple explanation with bicliques. 


Keywords: Skein, SHA-3, hash function, collision attack, preimage 
attack, biclique, permutation, Grpstl. 


1 Introduction 

Meet-in-the-middle attacks have been known in cryptanalysis at least since the 
analysis of Double-DES [S| , but got less attention in 90s and early 2000s because 
of more difficult key schedules in contemporary block ciphers. They regained 
prominence with the introduction of the splice-and-cut framework by Aoki and 
Sasaki for hash functions |21I2H]. Aoki and Sasaki considered various designs 
and demonstrated how to construct pseudo-preimages for compression functions 
based on block ciphers. Pseudo-preimages can be converted to regular preimages, 
though this reduces the advantage previously gained over brute force. 

While the first splice-and-cut attacks were quite simple, they quickly became 
more sophisticated as cryptanalysts tried to increase the number of rounds bro- 
ken [EE!. That number for the first attacks was determined by the length 
of chunks — two sections of a primitive each independent of its own set of 
key/message bits called neutral bits. For example, two DES calls in Double-DES 
are chunks each independent of half of the key. Later research showed how to 
start the attack with a sophisticated construction (so called initial structure ) 
over several rounds to increase the total number of rounds in the attack 0iE| , 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 544-gtTT] 2012. 
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which culminated in the concept of bicliques jE| • While initial structures relied 
on slow diffusion, bicliques do not need that condition. In turn, they translated 
the condition on internal states being suitable for meet-in-the-middle attacks to 
the requirements on how these states map to each other under different sub- 
transformations . 

Bicliques. The new biclique technique jHIEi led to a few surprising attacks 
on AES, though many of them had only a constant factor improvement over 
exhaustive search. The attack has influenced those reducing the security level 
of the full Square [TB|, Kasumi [l.'lj , IDEA |15j . All these attacks need a small 
but noticeable number of operations to test a single key, and in our opinion they 
have smaller potential. Indeed, even a single operation for each key implies a 
lower bound on the complexity which is not far from exhaustive search. Also 
from the technical point of view, the use of bicliques in those settings is not 
much different from earlier use of initial structures. 

From Parametrized Transformations to Permutations. The key/message sched- 
ule is a crucial element in the biclique attacks. In Section |2| we show how to 
enumerate N message candidates with only 2 y/N states. 

However, there are several settings where an attacker can not manipulate a 
scheduled parameter, or there is no schedule at all. For example, preimage attacks 
on blockcipher-based hash functions first consider a compression function and 
produce pseudo-preimages, and then run a computationally expensive meet-in- 
the-middle attack to produce real preimages. If an attacker wants to reduce the 
cost by avoiding the second step, then he has to assign the chaining value (CV) 
with the original initial value (IV). If the compression function is based on the 
Matyas-Meyer-Oseas mode with E K as a block cipher, 

F(CV, M) = E CV (M) © Af, 

where M stands for a message block, then the attacker analyze the permutation 
E iv (•)• 

Another example is the SHA-3 finalist Grpstl with output transformation x <— 
Truncate(x © P(x)). where P is a fixed permutation. Therefore, the translation 
of the biclique technique to permutations is quite promising. 

Permutations have been subject to a few recent attacks 5 which use a 

predecessor of biclique — initial structure. A natural question is whether the 
more general concept of bicliques can be carried out to this setting and even if 
so whether the advantages of long bicliques can be used similarly to AES. 

Collisions for the MMO-Based Primitives. While the Matyas-Meyer-Oseas 
(MMO) and Davies-Meyer (DM) modes are equally resistant to generic at- 
tacks 0, they are way more different when dedicated methods are considered. 
Collision attacks typically fix the chaining value, so in the DM mode 


F(CV, M) = E m (CV) © CV 
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an attacker is able to manipulate the round injections through the modification of 
M, while in the MMO mode he is able to choose only the input. From our point of 
view, famous collision attacks on the MD4/SHA family 0E2 demonstrate that 
the first setting is much more friendly to the attacker. Indeed, the most powerful 
collision search method — differential cryptanalysis — works with related-key 
characteristics in the DM mode, and with regular characteristics in the MMO 
mode. Related-key attacks on the full AES jOj hint that the former setting is 
more suitable. 

The hash function Skein follows the MMO mode and is an object of our analy- 
sis. The existing near-collision attacks on the compression function of Skein |3ll2fi| 
are essentially free-start collisions, i.e. they inject the difference in the chaining 
value or the tweak. Therefore, we conclude that mounting a regular collision 
attack on the hash function based on MMO is quite difficult. The very recent 
pseudo-collision attack E! on Skein is a great step forward, as we discuss in the 
further text. 

Our Contributions 

In Section |2I we introduce a new notion of sliced biclique as a translation of 
a regular biclique to permutations. The new concept helps to carry out the 
meet-in-the-middle attacks and the biclique technique to permutations without 
modifiable parameters. We call parameters both keys and messages. 

We improve a very recent technique of finding pseudo-collisions with pseudo- 
preimages and show how to get regular collision attacks on the MMO-based 
primitives (Section EJ. We obtain the first collision attacks on the reduced round 
Skein hash function (Section EJ) . The new attacks are also translated to new 
preimage attacks on Skein (see the extended version of this paper |T3j). 

Then we consider the output transformation of the SHA-3 finalist Gr0stl-256 
and derive the first shortcut 6-round attack (Section 0 . Finally, we analyze a 
procedure from earlier meet-in-the-middle attacks called message compensation 
(Section Q). Previously ad-hoc, it gets a clear interpretation as a sliced biclique 
(see the details in the extended version) IH). 

2 Splice-and-Cut Attacks and Bicliques 

Splice-and-cut attacks j2H23| were designed as a preimage search method. A 
simple splice-and-cut attack is applied to the Davies-Meyer-based compression 
function F: 

F(CV, M ) = E m (CV ) ® CV, 

where CV is a chaining value, M is a message block, Ek(-) is a block cipher. 
An attacker is given an n-bit hash value H and has to find a preimage M. The 
preimage search is organized as follows. The attacker partitions the message 
space into sets, which are represented as two-dimensional array of messages 
{M[i,j]}, and process each set independently. Given {M[i, j]}, he selects an 
internal state S and an internal variable v such that v as a function of S in one 
direction does not depend on i, and in the other direction does not depend on j: 
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CV 


MM s M[*,j] 





H 


Then he assigns S with an arbitrary value and computes v in the forward direc- 
tion for all possible j (denoted by j) and in the other direction for all possible 
i (denoted by {Vi}), computing CV and using H on the way. The overlap of the 
resulting two sets yields preimage candidates which are tested on the full state 
width. The indices i and j typically belong to [0; 2 d — 1] for some d, which yields 
the matching probability 2 2d_n for a single set and the complexity 

2"- rf for the pseudo-preimage search. To find a full preimage the adversary gen- 
erates 2 fi / 2 pseudo-preimages, computes 2 n ~ d / 2 CVs out of the initial value, and 
checks for a matching pair. The total complexity is 2" _d / 2 + 1 without optimiza- 
tions (which are not always possible), so only d > 3 provides an advantage over 
brute force. 

The basic attack was carried out to other modes and even block ciphers. For 
the latter, the encryption oracle plays the role of the feedforward to link the 
input and the output. 

A biclique is an extension for the first step of the attack, which is based upon 
an earlier informal concept of initial structure 01221 • Instead of a single state 
S, a biclique is defined over a sub-cipher — a part of the primitive, typically 
several rounds long — and for a particular group of keys or messages that are 
subject to test. A biclique over / for parameters {M[i,j]} is pair of state sets 

m^p,} 


such that 

q. AIM, />.. (i) 

A biclique tests parameters { M[i,j ]} in the same way as in the basic attack. 
The matching variable v is computed in both directions: 



The condition ([[]) guarantees that if M[i. j] is a preimage then the computa- 
tions from Pj and Qi meet in a biclique exactly as at the matching point. 

The crucial property of a biclique is that it enumerates 2 2d parameters with 
only 2 d+1 internal states. The value d is called dimension of a biclique, and the 
number of rounds in / — length of a biclique. 

The computational advantage of a biclique attack is the same as in the basic 
attack, and hence is proportional to the dimension. 
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3 Bicliques for Permutations 

The simplest way to turn a permutation into a preimage-resistant function is to 
xor the input to the output: 


F{x) = E( x) © x. (2) 

Our goal is to construct a preimage search algorithm, which recovers x from 
given H = F(x). We proceed as follows. 

Using a specific algorithm, we select a sub-permutation g within E and an 
internal state V in E but not in g. Denote the input state of g by Q. and the 
output state of g by P. We partition the space of all states into sets {Qi,j}, which 
we represent as a two-dimensional array of states. Here i,j are d-bit values for 
some d. We test independently each set if it contains a state that correspond to 
a valid preimage x. Let us denote the 5 -image of Qij by : 



We will explain how to choose g and partition of Q in a subsection ” Construction 
algorithms” , and it will also become clear why we use two indices to enumerate 
states Q. 

The rest of this section is devoted to finding an improved way to test a single 
set of states. A straightforward way to check if one of {Qij} is a solution to (0 
is to compute for each i,j the state V two times. First, as a function of P in the 
forward direction, let us denote this computation by F . Second, compute V as 
a function of Q in the backward direction: computing x, then E(x) = H © x, 
and then V ; let us denote this computation by F . Hence we check if 

3 i,j: f(P i , j ) = ^(Q i , j ). (3) 

This algorithm is equivalent to the exhaustive search and requires 2 2d computa- 
tions of E. 

The complexity can be reduced as follows. Let v C V be an internal variable, 
and f v and % be the projections of and ^F , resp., to v. We say that the states 
Qi. j and Pij form a sliced biclique, if the following conditions hold: 

V*J t(Pij) = t(Poj). 

Therefore, the necessary condition in Equation @ can be reformulated as fol- 
lows: 


MJ : l?(Pi d ) = *F(Qi,j) =► 3 i,j : t(P itj ) = *=* 

<=>3i.i: Tv(Po, j ) = %(Qi,o). (4) 
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Let us denote f v (Po,j) by vj and f v (Qi,o) by Hence one of {Qi.j} is a solution 
if 

a i,j: (5) 

To check it we need to call f v and f v 2 d times each, which is less than 2 d calls of 
E. The computations are depicted in Figure 0 The matching candidates yields 
a pair (i,j) and the state Qi.j, which we retest as a preimage candidate. For the 
full attack we need to partition the full input domain into the groups of size 2 2d 
and construct bicliques for them. 

If the complexity of constructing a biclique and retesting the false alarms is 
small compared to 2 d , then 2 2d states are tested with complexity 2 d , and the set 
of all n-bit states is tested with complexity 2 n ~ d . In the most of our attacks we 
test only a subset of states of cardinality 2 r with complexity 2 r ~ d The parameter 
d is called a dimension of sliced biclique. 





output 


Fig. 1. Sliced biclique for a permutation 


Construction Algorithms. Let us describe a construction algorithm for sliced 
bicliques, and then discuss its modifications. First we choose (below we will 
explain how) state Qo,o and two sets of differences {A} and (Vj), i,j > 0. We 
construct a biclique where 


Qi,j = <3i,o® Ad ( 6 ) 

Pi, i = Poj © Vi. (7) 


We proceed as follows 

1. Compute P 0 ,o g(Qo,o)- 

2. Set Qoj <- Qo,o ® A,-, compute Poj for all j > 0. 

3. Set Pj..j <- Pqj ® Vi, compute Qi.j for all i,j. 

Hence Equation ( 0 ) is fulfilled by definition, and we need to prove Equation ( 0 ) . 
We claim that it is fulfilled if the states Qo,o, Qi.o, Qi,j , Qoj form a boomerang 
quartet |2H! over / with differences A, and Vj, as demonstrated in Figure 0 a). 
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Indeed, Qo,j = Qo,o © Aj by definition. We also have 

Pi,j = g{Qo,j ) © Vi; Pi, 0 = ^(Qo.o) © V,-. 

Therefore, g~ 1 (P iJ ) © g~ 1 {P ifi ) = Qij © <3 i)0 = A if 

(Qo.Oj Qi, o, Qi,j, Qo,j ) — boomerang quartet. (8) 

In order to figure out sufficient conditions for the latter statement to hold, we 
consider two groups of differential trails. The trails in the first group are called 
A-trails and describe the evolution of differences Aj : 


Qi, o 
Qij 



Aj ->■ Pi, 0 ®Pi,j. 


The trails in the second group are called V-trails and describe the evolution of 
differences V»: 


°’ j ^ 0,i ’ =► Vi -»■ Q 0 ,j © Qij. 

Pij Qij. 

As proved in HE!, Condition (0) holds with probability 1 if the A- and V-trails 
share no active nonlinear elements (Figure |21 b)). Such bicliques are called based 
on non-interleaving trails. A straightforward way to achieve this property is 
to select Aj and Vi so that their diffusion is minimum. A more sophisticated 
approach is to choose the state Qo,o so that the diffusion is minimum. 

If A- and V-trail share nonlinear elements, we say that such bicliques are 
based on interleaving trails. 



Fig. 2. Differential properties of sliced biclique 
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4 Framework of New Preimage and Collision Attacks on 
Skein 

The SHA-3 finalist Skein [HU employs the Matyas-Meyer-Oseas mode to con- 
struct a compression function. It takes the block cipher Threefish (denoted by 
Ek(-)) and computes: 


F{CV, T, M ) = E cv ,t(M) © M, 

where CV is the chaining value, and T is the tweak value. Due to difficulties in 
mounting collision attacks on the MMO mode, the only published attack on the 
Skein hash function is the preimage attack m based on regular bicliques. The 
parameter M[i,j] in the biclique equation JU is the chaining value. As a result, 
in the preimage attack on the compression function the attacker has to work 
with multiple CV’s to get a pseudo-preimage. A full preimage requires another 
meet-in-the-middle procedure (Section 13). The first step must have complexity 
2”- 3 or smaller to yield an advantage over brute-force, which implies that only 
bicliques of dimension 3 or larger should be used. 

Equipped with the concept of sliced bicliques, we can fix the chaining value 
and attack the permutation EivQ ■ Hence we can generate full preimages without 
the pseudo-preimage step. The complexity drops to 2 n ~ d instead of 2 n+1-d / 2 , 
and restrictions on the biclique dimension do not hold anymore. Meet-in-the- 
middle attacks on the first call of the MMO and similar modes exist |221l3n| , but 
do not use the long biclique approach yet, and were not applied to Skein. 

Collision Attacks. A more interesting property of the MMO mode comes out if 
we consider a very recent pseudo-collision attack which uses regular bicliques ini 
The method produces pseudo-collisions out of biclique preimage attacks as fol- 
lows. Assume we have a biclique of dimension d and are able to match deter- 
ministically on some l hash value bits. Then the adversary generates partial 
pseudo-preimages to a hash value with these l bits equal to an arbitrarily chosen 
constant h. Hence 2 2d ~ l /-bit partial pseudo-preimages to h can be generated 
with cost 2 d . Note that they collide on l output bits. The adversary generates 
2 ^/ 2 — i/ 2 suc h preimages and expect a pair of them to collide on the remaining 
(n — /) bits by the birthday paradox. Since chaining values and schedule in- 
puts are not fixed in the attack, this yields a pseudo-collision with the expected 
complexity 2 ("/ 2 -*/ 2 )+(<i)-( 2 d -0 = 2 ™/2+*/2-d. The approach both for DM and 
MMO modes. 

The optimal d satisfies the equation d = 2d — l, which implies d = l. The 
attack is optimal if all preimages are generated out of a single biclique, which 
implies 

l = n/2-l/2 o l = n/3. 

Hence the minimum complexity of collision search is 2”/ 3 . 

Again, the chaining value can be fixed in the MMO mode if we apply the 
sliced biclique concept. Then we can generate real collisions instead of pseudo- 
collisions. However, we can break fewer rounds compared to the pseudo-collision 
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attacks. The reason is the diffusion of differences A and V to the whole state 
while computing v, whereas in the regular biclique attack the effect of those 
differences is postponed. Nevertheless, our approach is more interesting, since 
the real collision attacks are considered a much stronger setting as compared to 
pseudo-collisions (cf. collisions for MD5) . 

Memory. The straightforward version of the attack requires to store all the 
pseudo-preimages generated, which makes the memory complexity be of the same 
order as the time complexity. However, as the preimage step is non-deterministic, 
we can employ memoryless collision search methods m , which multiply the time 
complexity by a small constant. Therefore, all the attacks described in the further 
text, except for the marginal ones, have memoryless equivalents. 


5 Collision Attacks on Skein 

Here we present the first collision attacks on the reduced Skein hash function. 
The MMO mode is considered to be difficult for collision search, since most 
methods require a fixed chaining value when attacking the compression function. 
Since the round injections in the MMO mode come from the chaining value, 
the cryptanalyst has no freedom there, and hence is unable to construct local 
collisions, apply message modification techniques, etc.. As a result, previous 
attacks on Skein dealt with the compression function only. The attacks 

are grouped according to the number of rounds covered by a biclique. Though we 
aim for the maximal dimension and the number of rounds attacked, for clarity we 
do not push the concept to the extreme and try to avoid complicated bicliques. 
Hence our attacks can be improved in the future. 

Short Description of Skein. We consider three members of the Skein hash func- 
tion family: Skein-512, Skein-256, and Skein-512-256. Skein-512 [E]j operates on 
the internal state of eight 64-bit words, while Skein-256 works with a state of 
four words. We denote the state words by S°, S 1 , . . . , S 7 . All the versions have 
72 rounds. Skein-512-256 merely truncates the output of Skein-512 to 256 bits. 
Each round of Skein-512 consists of four (two in Skein-256) simple transforma- 
tions called MIX: 


yo = xo + xi (mod 2 64 ); 

2/i = (xi ^i? (dmod8)+ljJ ) © 2/o- 

where R is a constant depending on the round number d. The invocations of 
MIX are followed by a word permutation and, every four rounds, also by an 
addition of a linear function of the chaining value and the tweak (constants in 
our scenario). 

The only published attack on the Skein hash function is a preimage attack [IS! 
on 22 rounds of Skein-512. 
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5.1 Skein-512 

As few as three rounds of Skein-512 are required to diffuse the contents of a single 
word to the full state. As a result, the bicliques based on non-interleaving trails 
are likely to cover two rounds only. We present bicliques of different kind that 
are capable to cover up to 4 rounds, and give some hints on how to construct 
longer bicliques. 

2-Round, Biclique. Our first examples deal with short bicliques of high dimension. 
As a result, the attacks have a significant advantage over brute-force. We use an 
additional enumeration of rounds in a biclique, starting with 0. 

We use an algorithm from Section 0 with non-interleaving trails. We choose 
an arbitrary Qo,o and construct bicliques of dimension 64 out of the following 
differences A and V: 

Aj = |0|0|0|0|0|j|0|0| after MIX in round 0 of the biclique, j = 1 . . . 2 64 — 1 
Vi = |0|0|0|0|0|0|i|0| after MIX in round 2 of the biclique, i = 1 . . . 2 64 — 1 

It is easy to check that the A- and V-trails do not share active non-linear com- 
ponents and hence produce a biclique (Figure Ej). 



Fig. 3. Non-interleaving differential trails in a sliced biclique of dimension 64 in 
Skein-512 


Only three rounds are required to diffuse a 64-bit word onto the full state. 
Hence we expect the matching part be two rounds long in both directions. A 
straightforward attack on 6 rounds uses a biclique in rounds 2-4 of Skein and 
word S 1 of the output of the 6-round transformation as the matching variable v 
(Figure EJ). 
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Depends on j EH H v EH Depends on i 



Fig. 4. Matching in 6-round Skein-512 


However, we extend it by one round with the idea of the indirect partial 
matching [I]. Consider the state word S° after round 6 as a function of Pjj. It 
is easy to check that 

Pi, 3 = Po,j © * =k S^Pij) = S°(Poj) + i. 

Therefore, if we set v = S°, we get vijj = vo J + i- As a result, 

3i,j : Vij = falj 44- 3i, j : Vj + i, 

which can be checked with complexity 2 64 . Hence we generate 2 64 64-bit partial 
preimages with cost about 2 64 . 

To produce new bicliques, we generate a new Qo,o and repeat the procedure. 
Full 7-round collisions are found within 2( 512-64 )/ 2 = 2 224 partial preimages with 
the cost 2 224 . Since the total number of states Q needed for the attack is less 
than 2 256 , it is unlikely that two identical states are produced. 

Collisions on the fewer rounds can be found with bicliques of dimension 128. 
These bicliques are two rounds long, but the diffusion in the matching part 
takes one round less in each direction, which gives only a 5-round collision. The 
complexity is 2 192 . 

3-Round, Biclique. If we decrease dimension to 20 and lower, the diffusion takes 
more than three rounds. As a result, we can construct 3-round bicliques of di- 
mension close to 20. We use an algorithm with non- inter leaving trails with some 
modifications. 

First, we carefully choose the position of the biclique in the compression func- 
tion and bits where the difference is applied in Q and P. Since the rotation 
constants in each MIX function are distinct, the diffusion properties may change 
significantly when we shift the biclique over rounds and the active bits over 
the 64-bit word. The best configuration we have found places the biclique in 
rounds 5-7 (or 8fc rounds further, because the rotation constants repeat every 
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8 rounds), where the states Q are defined before the MIX operation in round 5, 
and the states P are defined after the MIX operation in round 7. The A- and 
V-differences are defined for as follows: 

Aj = |0|0|0|0|0|0|j < 45|0| j = 1 ... 2 19 - 1 
Vi = |0|i|0|0|0|0|0|0l i = 1 . . . 2 19 - 1 

For d < 19 we simply set the most significant bits of A and V to zero. We 
additionally require that the least 45 significant bits of the word S 6 in Q be 
equal to 0 in order to trails from interleaving. There is no other restriction on 
Q, so we can generate the states Qo,o in message sets simply by changing the 
words S °, . . . , S 3 . Since we need less than 2 256 states, it is unlikely that there 
would be a collision. This configuration produces a 3-round sliced biclique. Note 
that reducing dimension does not make the trails to interleave. 

The length of the matching part decreases as the dimension grows. We have 
checked the diffusion on a PC and figured out that the matching part covers 7 
rounds for d = 17. In this configuration we match on bits 30-33 of word S 2 and 
bits 20-32 of word S 3 of input to the compression function. The matching is not 
deterministic, as for some bits the difference is equal to zero with probability 
Pi < 1. We have calculated the type-I error probability as n iPi = 0-6 and 
conclude that probability 0.4 we miss a solution. Therefore, the total complexity 
is about two times larger compared to the deterministic case and is equal to 
about 2 248 

For d = 10 the matching part takes 8 rounds. The matching variable consists 
of bits 17-21 of word S° and bits 24-31 of word S 2 . The type-I error probability 
does not exceed 0.2, and the total complexity is 2 251 . The other values are given 
in Table Q 

4-Round Biclique. A regular biclique in the preimage attack on Skein ns! covers 
4 rounds with two key additions. If we consider these rounds without the key 
addition, we get exactly a sliced biclique of the same dimension. The diffusion 
in the matching part will be slightly different because of the rotation constants, 
but we still can bypass 10 rounds. Though the cost of the biclique construction is 
quite expensive — 2 209 — there are 815 bit degrees of freedom left, of which 303 
are in the internal state. We propose to use this freedom to amortize the biclique 
construction cost and generate new Qo,o, so that a 14-round partial preimage is 
found with complexity 2 3 . Full collisions are found with complexity 2 254 " 5 . 

Longer Bicliques. Bicliques of dimension 1 can be constructed up to 8 rounds, 
but the advantage over brute-force attacks is really marginal. Another problem 
is that the construction cost of a single biclique is very high, and we are unaware 
of how to exploit the degrees of freedom over so many rounds given no freedom 
in the injections. 
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Table 1 . Collision attacks on reduced Skein with large memory requirements (close 
to the computational complexity). Memory less attacks add a small constant to the 
exponent. 


Skein-256 

Skein-512 

Rounds 

Complexity 

Rounds 

Complexity 

2 

2 S0 

5 

2 192 

4 

2 96 

7 

2 224 

8 

2 120 

10 

2 248 

9 

2 124 

11 

2 251 

12 

2 ^e.o 

14 

2 too 


5.2 Skein-256 

Diffusion in Skein-256 is generally faster, because the internal state consists of 
four words only. Typically it takes one round less to affect the whole state. 
As a result, non-interleaving biclique trails and the matching part are shorter. 
We figured out that collision attacks on Skein-256 with bicliques of the same 
dimension lag 2-3 rounds behind the attacks on Skein-512. For instance, bicliques 
of dimension 64 and 128 cover one round only, and the matching part is two 
rounds shorter. This results in 2-round collisions with complexity 2 85 and 4- 
round collisions with complexity 2 96 . 

Bicliques of smaller dimension are found to be less sensitive to the smaller 
state size. The low-dimension attacks for Skein-512 lose two rounds when being 
translated to Skein-256 (Table |T]) . 

The biclique construction, including trail details and partition of the state 
space, is very similar to that in Skein-512, so we do not give much details. The 
2-round biclique yields 2- and 4-round attacks, which correspond to 5- and 7- 
round attacks on Skein-512. The 3-round biclique with dimension 17 yields an 
8-round attack. 

6 Certificational Preimage Attack on the Reduced Gr0stl 
Output Transformation 

In this section we present a certificational attack, i.e. it has only a small ad- 
vantage over the exhaustive search, on Grpstl sru — a SHA-3 finalist with a 
compression function not based on a block cipher. It invokes two permutations 
P and Q, both AES-based, and updates the chaining value CV as follows: 

CV <- CV © Q(M) © P(M © CV), 

where M is a message block. The final call of the compression function is followed 
by the output transformation 


F( x) = Truncate(a: © P(x)), 
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where the truncation operation takes half of the state to get 256- and 512-bit 
outputs. Hence Gr0stl-256 operates on a 512-bit state and permutations P and 
Q , and Grpstl-512 operates on a 1024-bit state. 

Permutations P and Q follow the AES design with very similar operations: 
SubBytes, ShiftBytes, MixBytes (8-byte analogue of MixColumns), and Ad- 
dRoundConstant. The ShiftBytes operation in Gr0stl-256 rotates i-th row by i 
positions to the left; details of the other operations are irrelevant for our attack. 
The sequence SubBytes-ShiftBytes-MixBytes-AddRoundConstant-SubBytes is 
equivalent to 8 (for Gr0stl-256) parallel 64-bit Super S-boxes fT2| . Due to the 
design simplicity, Grpstl has been the target of numerous cryptanalytic at- 
tacks [E3Gnil2H|) though only few of them violated collision or preimage re- 
sistance of the hash function (213 EDI- The paper j3Dj addresses virtually the 
same problem as we do, and obtains preimage attacks on the 5-round version of 
the compression function, including the preimage attack on the 5-round output 
transformation. 

To run a preimage attack, and the first preimage attack in particular, it is 
desirable to invert the output transformation of Grpstl. As it is also claimed to 
be one-way, it serves as a natural target for sliced biclique attacks. 

We adapt a differential view as it provides a simple explanation of the attack 
in differential trails, making it similar to both rebound attacks [HU and recent 
biclique attacks on AES. The main distinction is that there is no round without 
a difference because there is no schedule. However, the difference expansion in 
the outbound phase must be deterministic unless we have additional degrees of 
freedom in the inbound phase. 

Attack. We denote the internal states within 6 rounds from #1 to #13, as 
depicted in Figure E3 We construct a sliced biclique of dimension 1 in states 
#4-#9, which contains the Super S-box layer in states #5-#8. The matching 
variable is a linear function of the variables in states #12 and #13 not affected 
by A- and V-differences. The A-difference has a single active byte, marked as 
lightblue. Its influence on the internal states within the matching part is also 
depicted as lightblue in Figure 0 The V-difference and its influence are depicted 
as green. 

The matching condition is a linear function of the bytes not affected by the 
differences. Let us elaborate on this statement. Consider the rightmost columns 
of states #12 and #13 and denote them by A and B, respectively. Let us note 
that B is a linear function of A. In turn, 7 bytes of A do not depend on i, and 
7 bytes of B do not depend on j. If the state Q,, j corresponds to a preimage, 
then a system of 8 — (8 — 7) — (8 — 7) = 6 linear equations should hold. All the 
equations have form 

Aj ®Bi = 0, 

which is easily transformed to Equation 0. 

We construct a sliced biclique based on interleaving trails. A bi clique of di- 
mension 1 is equivalent to a single boomerang quartet (Figure |2l left). In contrast 
to attacks on Skein, all the four relevant differences are distinct. 
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Fig. 5. Preimage attack on the reduced Gr0stl-256 output transformation 

Bicliques are constructed as follows. First, we arbitrarily choose Ao t i 7^ Ai ,i 
and Vi.o 7^ Vi,i, which are all active in one byte only, as specified earlier. We 
construct the states {Qi,j}i,j=o,i, which satisfy the following equations: 


<3o,o ® Qo,i = A),i; <9i,o © Qi,i = ^ 1 , 1 ; 

^0,0 © Pi,o = Vi,o; F 0) i ® Pip = Vip. 


(9) 

(10) 


First, we derive the differences in #5 and in #8. Then we reformulate Equations (0) 
for each Super S-box, and solutions are found independently by exhaustive search 
with a total complexity around 2 70 . The solutions are then concatenated. For the 
details, we refer to the long biclique attack on AES jHj , which gives a description 
of an equivalent algorithm. 

The complexity is amortized as follows. Each Super S-box has 7 inactive input 
S-boxes. There exist 2 56-8 = 2 48 alternative values for them which do not affect 
the active output S-box. Hence we can generate 2 48 ' 8 = 2 392 sliced bicliques out 
of a single one. As the hash value contains 256 bits only, we have enough freedom 
for the attack. For each biclique, i.e. 2 2 states, we recompute only a portion of 
the S-boxes in each round, with 2 • (8 + 16 + 2 + 7 + 56 + 8) = 194 S-boxes or 
2 -3 calls of the permutation. Hence the amortized cost of a single state test is 
2 -5 , and the total attack complexity is 2 251 . 

7 Message Compensation 

The message compensation procedure PEI! instructs how to select message 
groups in the splice-and-cut attack in case of a strong, nonlinear message sched- 
ule. Existing applications are very ad-hoc and complicated, ft is possible, how- 
ever, to give a unified view on the message compensation problem and existing 
solutions with bicliques for permutations. The majority of this section is left for 
an extended version of this paper m- 

We propose the following algorithm a generic message schedule. Suppose 
you construct a biclique in rounds N 1 -N 2 , and want to describe a message set 
{M[i,j]} such that 

1. Injections W N °, W No+l , . . . , H 7iVl “ 1 do not depend on j\ 

2. Injections W N2+l ,W N2+2 , . . . , W N:i do not depend on i. 

We propose to construct a sliced biclique without a matching point in rounds Ni~ 
IV3. The difference Aj is defined before round N\. To satisfy the first condition, 


Bicliques for Permutations 559 


we assign the words of Aj that correspond to W N ° ,W N ^\ . . . , W Nl ~ l to zero. 
If some words are left undefined, then we get a freedom in these values and can 
use it to manipulate the difference propagation in the A- trails. 

We define V,; after round N :i . To satisfy the second condition, we assign the 
words of Vj that correspond to W N2+1 , W N ' 2+2 , . . . , W Nli to zero. Again, if some 
words are undefined, we keep this freedom. 

Finally, we construct a sliced biclique based on non-interleaving trails. We 
use undefined parts of Aj and V,; to control the diffusion on the word level, and 
select M[ 0, 0] to control the diffusion, if necessary, on the bit level. We may also 
choose other round indices for a biclique, if this makes the difference selection 
more clear. We may also have to deal with other constraints like padding, which 
further reduce the freedom in A and V. Finally, we may have to construct 
a biclique based on interleaving trails, if non-interleaving ones are impossible 
because of the diffusion. 

8 Conclusions 

We have introduced sliced bicliques as a new tool for the analysis of permutations 
in the context of preimage and collision attacks. We have demonstrated that the 
advantage in the number of rounds from the long biclique idea can be obtained 
also for permutations. The application of our concept to different design has 
interesting consequences. 

First, our collision attacks on Skein demonstrate that the MMO mode may not 
be as resistant to collision attacks and the differential cryptanalysis in particular 
as it was considered. The fundament of our attacks is the new pseudo-collision 
search technique that has been recently introduced. Though we employ some 
elements of differential cryptanalysis, the details are completely different from 
the famous collision attacks on the SHA family. Hence we suppose that the 
potential of differential cryptanalysis for high-profile hash functions has not been 
exhausted. 

Secondly, our preimage attacks on the Grpstl output transformation show 
that the concept of the Super S-box contributes not only to the biclique attacks 
on the designs with the key schedule (AES), but also on the ones without the 
schedule. We expect this type of attack to progress alongside with the future 
techniques for the Super S-box. 

Finally, we explained the message compensation in the biclique terms. We 
expect that the designers of future meet-in-the-middle attacks on SHA-2 will be 
able to provide a compact two-step description of their results. First, a biclique 
in the schedule is constructed, and secondly, it is used to construct a biclique in 
the state. We are looking forward to new techniques that would combine these 
bicliques in an optimal way. 

We leave a significant amount of targets for the future work. 7-round Grpstl- 
256, 9- and 10-round Gr0stl-512, Whirlpool, BLAKE are natural targets. Con- 
struction of bicliques of high dimension out of interleaving trails remains an open 
problem. 
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Abstract. In this paper, improved cryptanalyses for the ISO standard 
hash function Whirlpool are presented with respect to the fundamental 
security notions. While a subspace distinguisher was presented on full 
version (10 rounds) of the compression function, its impact to the se- 
curity of the hash function seems limited. In this paper, we discuss the 
(second) preimage and collision attacks for the hash function and the 
compression function of Whirlpool. Regarding the preimage attack, 6 
rounds of the hash function are attacked with 2 481 computations while 
the previous best attack is for 5 rounds with 2 481 ’ 5 computations. Re- 
garding the collision attack, 8 rounds of the compression function are 
attacked with 2 120 computations, while the previous best attack is for 
7 rounds with 2 184 computations. To verify the correctness, especially 
for the rebound attack on the Sbox with an unbalanced Differential Dis- 
tribution Table (DDT), the attack is partially implemented, and the 
differences from attacking the Sbox with balanced DDT are reported. 

Keywords: Whirlpool, preimage, collision, meet-in-the-middle, guess- 
and-determine, local collision. 


1 Introduction 

Hash functions are taking important roles in various aspects of modern cryp- 
tography. Since the collision resistance of MD5 and SHA-1 has been broken by 
Wang et al. |lKj . cryptographers have looked for stronger hash function designs. 
While various new designs are discussed in the SHA-3 competition p] , some of 
existing hash functions seem to be much stronger than the MD4-family. Eval- 
uating such hash functions is useful especially if they have been standardized 
internationally. 

For hash functions, three security notions are classically considered: Colli- 
sion Resistance, Second-Preimage Resistance, and Preimage Resistance. Besides, 
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cryptographers recently have considered various non-ideal properties. Although 
considering such non-ideal properties is important especially for determining a 
new standard, focusing on vulnerabilities that can be exploited in practice is 
more important especially for evaluating hash functions in practice. 

Whirlpool jl| is a 512-bit hash function proposed by Rijmen and Barreto in 
2000. The compression function uses a 10-round AES based cipher with 8*8-byte 
internal states, and the output is computed with the Miyaguchi-Preneel mode 
0 Algorithm 9.43]. Whirlpool has been adopted by ISO jHJ and NESSIE j2|. 

Regarding the collision attack, the rebound attack proposed by Mendel et al. 
0 is very effective with respect to the differential attack against AES based 
structure. Indeed, Mendel et al. presented a 4-round collision attack on the 
hash function and a 5-round collision attack on the compression function of 
Whirlpool. Many improved techniques of the rebound attack have been devised 
such as start-from-the-middle technique 0 , linearized match-in-the- middle tech- 
nique 0, super-(S)box analysis jlUlllj . and multiple-inbound technique mm. 
Besides, for the AES based structure with 8*8 state including Whirlpool, more 
techniques have been proposed such as hyper-Sbox analysis na, non-full-active 
super-Sbox analysis d, efficient list-merging technique mi, and three inbound 
rounds mi ■ Several practical results are given for round-reduced algorithms and 
intermediate rounds in I9I17I18[ . This paper exploits the differences in both of 
data processing part and key schedule part. Some similarities can be seen in the 
analysis on AES-256 [T[5j and two analysis on Grpstl [2(12 lj . 

Regarding the preimage attack, meet-in-the-middle (MitM) attack with the 
splice-and-cut technique proposed by Aoki and Sasaki mi has been actively 
discussed. Several papers proposed improved techniques [2312 4[ . For the preimage 
attack against the AES based structure, Sasaki showed a second preimage attack 
on 5 rounds of Whirlpool m Later, Wu et al. improved its complexity and 
extended it to the preimage attack |2Sj. Note that Bogdanov et al. showed an 
attack on 10-round AES in hashing modes with the biclique technique 822 J- 
Because this attack exploits the weakness in the AES key-schedule, the attack is 
specific to AES and cannot be directly applied to other AES based primitives. 

Our Contributions. In this paper, we improve cryptanalyses on Whirlpool 
with respect to the fundamental security notions. The main results are a 6- 
round preimage attack on the hash function and an 8-round collision attack on 
the compression function. The results are summarized in Table QJ 

Our preimage attack is based on the previous 5-round MitM attacks (2323- 
The number of attacked rounds is extended by applying the guess-and-determine 
approach during the MitM attack. Moreover, we increase the number of free 
bits for each chunk by exploiting the freedom degrees of the key, while previous 
attacks fix the key as a constant. More precisely, the key schedule function shares 
the same round function with the data process procedure, and thus we separate 
the key schedule function in the same way with the data process function. 

Our collision attack is based on the rebound attack. We use the key difference 
to cancel the difference in the data part, while previous work avoided inserting 
differences to the key schedule. This leads to a differential path with a high 
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Table 1 . Summary of attack results 


Type 

Target 

#Rounds 

Time 

Mem. 

Ref. 

Remarks 

Fundamental 

Preimage 

Hash 

Function 

5 

5 

5 

2 * 0 ±.o 

2 448 

2 465 

2 481 

2 504 

2 o* 

2 96 

0(1) 

2 256 

0(1) 

Ours 

Ours 

Ours 

Ours 

Memoryless MitM 

Memoryless MitM 

Second 

Preimage 

Hash 

Function 

5 

5 

5 

2 bu4 

2 448 

2 464 

2 481 

2 504 

5 

0(1) 

JSs 

Ours 

Ours 

Memoryless MitM 

Memoryless MitM 

Collision 

Hash 

Function 

5 

2 o* 

2 120 

2° 

2 64 

H 

nn 

semi-free-start 

semi-free-start 

free-start 

free-start 

Function 

\ 

2 i84 

2 120 

2 8 

2 64 

2 120 

2 b 

2 128 

2 8 

2 8 

2 8 

Ours 

Ours 

Ours 

Other 

Near-collision 

Function 

l 

2 Lf ° 

2 112 

2° 

2 128 

m 


Distinguisher 

Function 

10 

10 

2 166 

2 121 

2° 

2 128 

g 


probability. In this paper, we implement our 4-round collision attack which only 
requires 2 8 computations. Because all previous collision attacks require at least 
2 64 computations even for a small number of rounds, this is the first example 
of the collision for a reduced compression function. We also partially implement 
the 7-round collision attack. We show an example of the 40-byte near-collision. 

2 Specification and Notations 

Whirlpool 0 takes any message with less than 2 256 bits as input, and outputs a 
512-bit hash value. It adopts the Merkle-Damgard structure. The input message 
M is padded into a multiple of 512 bits. In details, the 256-bit binary expression 
of the bit length i is padded according to the MD-strengthening, i.e. M||1||0*||£. 
The padded message is divided into 512-bit blocks M 0 ||Mi|| • ■ ■ \\M N -i. Let H n 
be a 512-bit chaining variable. First, an initial value IV is assigned to Ho- Then, 
H n+ 1 <— CF (H n , M n ) is computed for n — 0, 1, . . . , N — 1, where CF is a com- 
pression function. Hjy is produced as the hash value of M. CF uses an AES 
based block-cipher Ek, which takes a 512-bit chaining variable H, as a key and 
a 512-bit message block M it as a plaintext. The output of CF is computed by 
the Miyaguchi-Preneel mode, i.e. E Hi (Mi ) ® M t ® H t . 

Inside the block cipher an internal state is represented by an 8 * 8 byte 
array. At first, Hi is assigned to the key value ko- Then, ten 512-bit subkeys 
ki, • ■ ■ , kio are generated by the key-schedule function defined as follows: 

k n+ 1 «— AC o MR o SC o SB (£;„), for n = 0, 1, . . . , 9. 
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- SubBytes(SB): applies the Substitution-Box to each byte. 

- ShiftColumns(SC): cyclically shift the ji'-th column downwards by j bytes. 

- MixRows(MR): multiply each row of the state matrix by an MDS matrix. 

- AddRoundConstant(AC): XOR a 512-bit constant defined in the specification. 

For the data processing part, Mj is assigned to the plaintext p. Then, the 
whitening operation is performed and the result is stored into a variable so, 
i. e. so <- ko ® p. The output sio of the block cipher is computed as follows, 
where AddRoundKey(AK) takes the XOR with k n+ -\ . 

s n+ 1 <- AK o MR o SC o SB(s„), for n = 0, 1, . . . , 9. 

Notations. Byte positions in a state S are denoted by integer numbers 0, 1, ... , 
63, where the byte 8 j + i corresponds to the byte in the *-th row and the j-th 
column of the state #£, and is denoted by #£[8jdM§. We denote the initial state 
for the data processing part in round x by #Dx J . Then, states immediately after 
SB, SC, MR, and AR in round x are denoted by #Dx SB , #Dx sc . #Dx MR , and 
#Dx ak , respectively. Obviously, #Dx AK is identical with #D(x+iy. Similarly, 
we use the notations #Kx* , #Kx SB , #Kx sc , #Kx MR , and #Kx AC for the 
key schedule part. We often denote several bytes of state #S by #S[a, b , . . .], 
e.g. 8 bytes in the right most column are denoted by #£[56, 57, ... , 63]. We also 
use the following notations to denote specific byte positions. 

- #S[row(i )]: 8 byte-positions in the i-th row of state #£ 

- #S[SC(row(i))]: 8 byte-positions which SC is applied to #S[row(i )] 

- #£[SC _1 (rou;(«))]: 8 byte-positions which SC -1 is applied to #S[row(i )] 

3 Related Work 

3.1 Meet-in-the-Middle (Second) Preimage Attack on Whirlpool 

In FSE 2011, Sasaki proposed the first MitM preimage attack on AES-like primi- 
tives mi Two main techniques were introduced: initial structure in an AES-like 
permutation and partial-matching across an MixColumn operation. As a direct 
application, a second preimage attack is found on 5-round Whirlpool hash func- 
tion in m In FSE 2012, Wu et al. improved the complexity of 5-round second 
preimage attack on Whirlpool [201 by exploiting more freedom degrees in the 
data state. They successfully represent the chunk separations by several essen- 
tial integer parameters, and launched an automatic exhaustive search. Moreover, 
they also proposed a method to deal with the message padding and extended 
the attack into a first preimage attack. 

3.2 Rebound Attack and Start-from-the-Middle Technique 

The rebound attack was introduced by Mendel et al. |Bj. If it is applied to 
Whirlpool, the 2-round path 8 — >• 64 — >■ 8 can be satisfied only with 2 s compu- 
tations. The path for rounds S and S + 1 is described in Fig. UJ First, an 8-byte 
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Fig. 1. Rebound and start-from-the- middle techniques 


difference at #6’ + 1 MR is randomly chosen, and it is propagated to #6’ + l sn . 
Then, a single-byte difference at one of the active bytes at #S' SB is randomly 
chosen, and it is propagated to 8 bytes of #S + l 1 . For each S-box in round 
5+1, randomly given input and output differences have solutions (paired values 
conforming the path) with probability about 2 _1 , and the average number of 
solutions is 2. Hence, if we choose 2 8 differences for the single byte at #S sn . we 
obtain 2 8 solutions for the corresponding 8 S-boxes. By iterating it for 8 active 
bytes at #S SB , we obtain 2 8 solutions for each i of #S SB [SC~ 1 (row(i))]. 

The start-from-the-middle technique is an improved procedure for the rebound 
attack, which was proposed by Mendel et al. (3) • It satisfies a 3-round differential 
path with the same complexity as the rebound attack. After obtaining 2 8 solu- 
tions for each i of #S SB [SC -1 (row(i))] with the rebound attack, each solution 
is computed until #5— l MH [5C' _1 (row(i))]. For each i, 127 kinds of differences 
are obtained at #5 — l MR . Then, a single-byte difference at #5 — l SB is cho- 
sen. The attacker propagates it to #S — 1 MR , and checks whether the 8-byte 
difference can be produced from the solutions of the rebound attack. Because 
there are 127 kinds of the differences for each i, the 8-byte differences can be 
produced with probability about 2 -8 . Therefore, by choosing 2 8 differences at 
#5 — 1 SB , we expect to find the desired difference. In summary, the 3-round 
differential path 1 — > 8 — > 64 — > 8 can be satisfied with a complexity of 2 8 . 

Note that the behavior of the S-box is explained based on the S-box of AES. 
Because the S-box of Whirlpool has a different property, the evaluation for AES 
cannot be applied to Whirlpool directly. We later discuss this issue in Sect. 15.41 


3.3 Distinguisher for the Full Whirlpool Compression Function 

Lamberger et al. proposed a distinguisher for the full Whirlpool compression 
function jlll 28 j . The distinguished property is called subspace distinguisher. 
The dimension of the input and output differences are defined before the analy- 
sis starts. The attacker aims to find paired values whose dimension of differences 
at input and output are lower than the defined ones. The core technique is run- 
ning the rebound attack (8 -> 64 — > 8) at two parts independently without 
determining the key value. Then, two results are connected and a long differ- 
ential path (8 — > 64 — > 8 — > 8 — > 64 — > 8) is satisfied by searching for an 

appropriate key value. Although the distinguisher beautifully breaks the full- 
round compression function, the impact is very limited. Nevertheless, collisions 
on compression function are generated with this technique for 7 rounds with 
(Time, Memory) = (2 184 , 2 8 ) or (2 120 , 2 128 ). 
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3.4 Local Collision on AES-Like Primitives 

For a distinguisher for AES-256, Biryukov et al. introduced differences to both 
the key and the data, and used the difference of round keys to cancel the differ- 
ence of the internal states of the data process by the AddRoundKey operation, 
i.e. the local collision occurs m- The local collisions may help the attacker to 
build a high probability differential path on AES-like primitives. 

4 Preimage Attack on 6-Round Whirlpool 

4.1 Overview 

Our first and main result is introducing the guess-and-determine approach to 
MitM preimage attack on Whirlpool hash function, and successfully increase 
one more attacked round. More specifically, during the independent chunk com- 
putation, even one unknown input byte of Mix Row makes all the 8 output bytes 
unknown, which is heavily unbalanced. So a chunk can guess a small number 
of unknown bytes in order to significantly increase the number of known bytes 
in the following rounds. Thus guess-and-determine approach is very effective for 
preimage attack on Whirlpool. 

Our second result is exploiting the freedom degree in the key to increase the 
number of free bits in each chunk, and thus successfully reduce the complexity. 
Since the key schedule of Whirlpool is the same with the data process, we can 
separate the key schedule and the data process into two chunks in the same way, 
which doubles the number of free bits in both chunks. 

Our third result is that we propose not only a first preimage attack on hash 
function with the lowest complexity, but also another memoryless preimage at- 
tack. Compared to the brute force attack, the second attack requires the same 
memory and a lower complexity. This is achieved by finding a last block attack 
first and then linking the chaining values with a fixed-key attack on the compres- 
sion function. Since both the last block attack and the fixed-key attack can be 
implemented in a memoryless way m, we obtain a memoryless first preimage 
attack. 


4.2 Preimage Attack on 6-Round Compression Function 

The chunk separation used in the 6-round attack is illustrated in Fig. El Five 
different colors are used to indicate the categories of the bytes. The gray bytes are 
constants which come from the hash/output value or the initial structure. The 
red/blue bytes belong to the backward/ forward chunk, which can be determined 
by the red/blue byte in the initial structure. The white bytes are affected by 
both red and blue bytes and we can only determine their values after a partial 
match is found. The purple bytes are the guessed bytes. 

Since that each row of the state #D1 MR has unknown bytes (in white color), 
if we went further back through MR -1 , all bytes would become unknown. The 
values of 24 white bytes in row 0 to row 5 are guessed. Thus we can maintain 6 
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Fig. 2. Chunk separation for the preimage attack on 6-round compression function 

red bytes in each of top 6 rows of the state #D1 SC . In the key state, we do not 
guess the values, since the white key state #K1 SC does not affect the matching 
state through the feedforward operation. 

All the possible values of the guessed bytes are used as extra freedom degrees 
to build the lookup table for the MitM. But after a partial match is found, we 
need to further check the correctness of the guessed values. More details about 
the guessing technique can be found in the following section. 


The Attack Algorithm. In order to evaluate the attack complexity, we need 
to know the parameters: freedom degrees in red and blue bytes (D r . Dj,), size of 
the partial matching m and the number of guessed bits D g . The explanation on 
calculating freedom degrees/size of matching point and how the partial matching 
works can be found in previous papers |25l26j . Here we omit these details due 
to the limited space. 

To summarize, the parameters for MitM attack in Fig. |2|are as follows. Free- 
dom degrees in red bytes: D r = 8 bytes = 64 bits (4 bytes in the key and 4 
bytes in the data). Freedom degrees in blue bytes: D h = 32 bytes = 256 bits 
(16 bytes in the key and 16 bytes in the data). Size of the guessed value (purple 
bytes) .Dg = 24 bytes = 192 bits. Size of the partial match: m = 32 bytes = 256 
bits (only in the data). Size of the full match: n = 512 bits. 

The attack algorithm is as follows: 

Step 1. Randomly choose the values of the constants in the initial structure. 

Step 2. For all the 2° r values { r, : } of the red bytes in the initial structure and 
2 d 9 guessed values {gj}, go backward to the matching point and store all 
2 D r+D g p ar ti a i matching values F(ri,gj ) in a look-up table L. 

Step 3. For all the 2 Db values {&*,} of the blue bytes in the initial structure, go 
forward to obtain the partial matching value G(bk ) and check if it is in L. 
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Step 4. Once a partial match (ri,gj,bk) such that F(ri,gj) = G(bk ) is found, 
use ( ri,bk ) to compute and check if the guessed value g,j is correct. If the 
guess is correct, check if it is a preimage. 

Step 5. Repeat the above steps 1-4 to find a preimage. 

The complexity is explained as follows: 

Step 2. It takes 2 Dr+D ° computations and memory to build the look-up table. 
Step 3. It takes 2 Db computations to find all the 2 Db+Dr+D B~ m partial matches. 
Step 4. 2 Db+Dr+D o- m computations are needed to verify the correctness for all 
the partial matches. There would be 2 Db+Dr ~ m valid partial matches that 
pass the correctness test, since the probability that gj is correct is 2 ~ D o. 
Step 5. The probability that steps 1-4 succeed is 2 Db+Dr ~ m ■ = 

2 D b +D r -n r fh e above steps are repeated for 2 n ~ Db ~ Dr times to find a 
preimge. 

Therefore, the complexity of the above algorithm is 

2 n-D b -D r ^2 D r+Dg +2 Ob + 2 D b+D r +D g -m^ _ 2 « . +2 D g -D b + 2 D g -mj ^ 

With the given parameters, the complexity is about 2 512 • (2 -64 + 2 192-256 + 
2 i92-256) jy 2 448 compression function calls. Only step 2 requires 2 64+192 = 2 256 
memory. 

It is observed that the pattern for the chunk separation can be represented 
as several numbers: b= the number of blue rows in #D2 MR . r= the number 
of red rows in #_D2 7 , w;=the number of white rows in #D5 SC . g=t\ie number 
of guessed rows in #Dl MR . Then the parameters for the MitM attack can be 
calculated as: D b = 16(6 — r) bytes, D r = 2w(8 — b ) bytes, D g = g( 8 — r) 
bytes and m = 8(g + (8 — w) — 8) = 8 (g — w ) bytes. In the following sections, 
we will continue using the parameters of b,r,w and g to identify the pattern 
for chunk separations. We searched for all the possible patterns of the chunk 
separation by exhaustively enumerating the parameters b, r, w and g. Fig.|2|shows 
the optimal complexity case ( b , r,w,g = 6, 4, 2, 6). Note that the 6-round attack 
is also applicable without using freedom degrees of the key. 


Memoryless MitM Attacks. In m, Morita et al. proposed the memoryless 
MitM technique, which can be applied in our attack by designing the following 
three functions: 

1) a mapping from the partial matching value to the blue value, 

2) a mapping from the partial matching value to the red (and purple) value, 

3) a pseudo-random boolean switching function taking the partial matching 
value as the input. 

However, we found that the memoryless MitM has some limitations. The 
memoryless MitM is very efficient to find one match, its complexity is lim- 
ited by half of the matching size m and increases linearly with the number 
of matches. Namely, at most 2 max ^ 0,rnm ^ Db ~ Dg ' Dr ’ rn ^ 2 ~ Dsi ^ computations can 
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be saved using memoryless MitM. Using look-up tables, we can save at most 
2max{o,mm{D b -D g ,D r ,m-D g }} com p U tations. This difference results in different 
optimal chunk separations, which is considered in the following sections. 

4.3 The First Preimage Attacks 

A first preimage attack is the combination of a second preimage attack and 
an attack on the last compression function which produces message block with 
correct padding. In order to find optimal first preimage attacks, we need to 
consider a lot of different attacks. 

Two Types of Last Block Attacks. The first preimage attack must fulfill the 
message length padding. In a fixed-key attack on the compression function, 10 
padding bits can be chosen if the initial structure is placed at the beginning of 
the encryption. This technique was used in ■ The probability that a random 
message block satisfies a constraint of the padding string is 2 -9 . Details are 
explained in Appendix 0 

In the chosen-key preimage attacks, the initial structure cannot be placed at 
the beginning of the compression function. So the chosen padding technique is 
not applicable. However, we can repeat the attack 2 9 times to obtain a valid last 
message block. 

Since Whirlpool uses 256-bit length padding and we just satisfied a small part 
of it, the rest part of the length cannot be known before the attack. Therefore, 
we need the expandable messages |S| to fulfill it. 

Two Types of Second Preimages. In previous attacks, the key (chaining 
value) is known before the attack. The preimage attack on the compression 
function is to find a message block that connect two chaining values. The fixed- 
key attack is equivalent to a second preimage attack if the input and output 
chaining values are chosen consecutively from the known ones. 

If the key is chosen, the value of the key (chaining value) can only be deter- 
mined after the attack. Then we need to connect it to one of the known chaining 
value. This is done using a MitM step on the chaining values. 

Different Combinations for the First Preimage Attack. First, we ana- 
lyzed all the 5/6-round fixed- /chosen- key attacks on compression functions and 
turn them into second preimage and last-block attacks. Second, we considered 
the fixed-key attacks with chosen padding and found more attacks on the last 
message block. At last, we combine the second preimage attacks and the last- 
block attacks to found the first preimage attacks with the lowest computations 
and the lowest memory respectively. 

The detailed results of all preimage attacks are summarized in Table |3 Note 
that we can adjust the time-memory tradeoff by choosing different combinations 
of second preimages and the last-block attacks or changing the tradeoff of MitM 
on the chaining value for chosen key attacks. 
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Table 2. Detailed Results on all preimage attacks on CF and hash function 



5-Round Attacks 

b r w 

D b D r m 

Function 

Preimage 

Last Block 

Preimage 


chosen-key 


5 4 2 

128 96 128 

2 4iD ,2 9D 

2*°°,2 yD 

2^ 0 ,2 aD t 

2448, 2 96 t 


4 3 1 

128 64 128 

2“°, 0(1) 

2 ftox , 2 oz 

2*°' ,2 oz 

fixed-key 


4 3 2 

64 64 64 

2**°,2 D * 

2'“' , ,2°‘*t 

2*°' ,2°* 

ml 

5 4 2 

64 48 128 

2- d4 ,0 (1) 

2“,0(l)t 

2’ ,J ,0(1) 

2 465 , 0(1)1 

fixed-key 
chosen padding 


4 3 2 

55 63 64 



2*°',2 00 

ml 

5 4 2 

54 48 128 



2“*,0(l)t 

6- Round Attacks 

b r w g 

D h D r Dg m 

Functioi 

on 

Second 

Preimage 

Last Block 

Preimage 

chosen-key 


6 4 2 6 

256 64 192 256 

2** a ,2 zo ° 

2 4e \2^ OD t 

2 4 °', 2 ZOD t 

2 48 i,2 256 t 

mi 

7 6 2 6 

128 32 96 256 

2‘ 0U ,0(1) 

2* y ',2 ±D 

2 < ‘ oa ,0(l)t 

fixed-key 


6 5 12 

64 16 48 64 

2* yo , 2°* 

2 * y o ? 2 °* 

2 ouo ,2°* 

mi 

7 5 15 

128 8 120 256 

2 0U ‘,0(1) 

2 ou4 , 0(1)1 

2“ a ,0(l) 

2 504 , 0(1)1 

fixed-key 

chosen padding 


6 4 13 

118 16 96 128 



2 1: ’". 2 1 l ~ 

ml 

7 6 13 

54 8 48 128 





t : The attacks with the lowest computations. 
* : The attacks with the lowest memory. 
ml : The memoryless MitM attacks. 



Fig. 3. Left: previous approach Right: our approach 


5 Collision Attacks on the Compression Function 

5.1 Overview 

In order to generate collisions with previous rebound approaches, the state at the 
beginning and the end must have the same differential form so that they can can- 
cel each other with the feed-forward operation. This is a strong constraint. We 
overcome this constraint by generating local collisions several times,*. e., cancel- 
ing differences of the data by using differences of the key. The idea is illustrated 
in Fig. 03 Because the diffusions for the data and key are identical, we can keep 
the same differential form. This makes possible to use the differential path with 
different differential forms between the beginning and the end. 

The idea of using the key difference is advantageous not only for canceling 
the output difference but also constructing a high probability differential path 
by using the local collision. For example, we use the following differential path 
for an 8-round collision attack. Here, “WH” represents the whitening operation. 

Key: 64 ^64^ 8 1 O 8 ‘-^64^ 8 6 ^f 1 ^ 8 5^84, 

Data: 64 23} 0 g Ck 1 ^ 8 0 8 1 ^ 8 ^ 0. (2) 
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Fig. 4. Differential path for 7R attack. Grey bytes are active bytes. The inbound phase 
for the data processing part is stressed by red squares. 

where, the most expensive part (full active state) is avoided for the data process- 
ing part to reduce the attack complexity and to keep enough freedom degrees. 

We use a rebound-attack approach to search for the values. First, the values 
for the key are searched. Then, the values for the data are searched for the fixed 
key pairs. The complexity is a sum of two searching phases, not a product. 


5.2 7-Round Collision Attack 


We explain our 7-round collision attack, with 2 64 computations and memory to 
store 2 s state. The differential path is as follows. See its illustration in Fig. 0J 


Key: 64 — > 64 - 


4 —? 8 

, 5 tb R _ 6 tk R 


7 th R 
7 tb R 


64, 

0 . 


The key and the plaintext should have the same difference so that the plaintext 
difference can be canceled by the whitening operation. Then, we make a local 
collision after the 4th round, and another local collision after the 7th round. 


Searching Procedure for Key Schedule Part. The goal is finding a single 
pair of key values satisfying the differential path for the key. The essential part 
of this procedure is finding two values satisfying the middle three rounds, 1 -y 
8 — > 64 — > 8. This can be done with the Start-from-the-Middle attack |0j. The 
complexity is only 2 8 computations and the amount of memory is 2 s state. If 
the middle three rounds are satisfied, the entire path are also satisfied by simply 
extending the path by 2 rounds in backward and 2 rounds in forward. Because 
this transformation is deterministic, the complexity for 7 rounds is unchanged. 
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Equivalent Transformation | 



st inbound phase for row(O) Merge inbounds 2nd inbound phase for ro 

Fig. 5. Details of the inbound phase 


Searching Procedure for Data Processing Part. This phase is performed 
after the key values are fixed. The goal is finding a pair of plaintexts which 
follow the differential path and generate a collision in the output. The procedure 
is divided into the inbound phase and the outbound phase. 

Inbound Phase. The inbound phase is from state #D2 I to state #D4 sn . 
which are stressed by red squares in Fig. 0 For the inbound phase, we search 
for the values with a similar approach to Mendel et al. m The details of the 
inbound phase are described in Fig. 0 Note that the key values are already fixed. 
Hence, the differences for #2 D 1 and are uniquely fixed. First, we apply 

an equivalent transformation to the third round, i.e. AK is performed between 
SC and MR. Then, the inbound phase is further divided into three parts; first 
inbound phase, second inbound phase, and merge two inbounds. 

First Inbound Phase for Row 0: We aim to find 2 8 paired values that sat- 
isfy the differential path between #D2 I [SC~ 1 (row(0))\ and #D3 SB [row(0)] 
which are described by red in Fig.0 We only compute a single row. The other 
rows remain unfixed. The difference for 8 bytes at #D2 I [SG~ t (row(0))\ 
is fixed to the same as #K2 I [SC~ 1 (row(0))] so that the difference of 
#D2 I [SC~ 1 (row(0 ))] can be canceled by AK -1 in the first round. Then, 
for all 2 s differences in #D2 Mfl [0], we compute the corresponding 8- 
byte difference at #D2 sb [SC~ 1 (row(O))]. The average probability that 
the fixed difference at #D2 I [SC~ 1 (row(0 ))] and a computed one in 
#D2 SB [SC~ 1 (row(0))] have solutions for all 8 bytes is 2 -8 . Because 2 8 
differences are examined in #D2 MR [0\, one pair is expected to have solu- 
tions and the number of obtained solutions is 2 8 on average. Finally, for all 
2 8 solutions, we compute the corresponding 8 bytes at #D3 SB [row(0)] and 
store them in a list L-y. 

Second Inbound Phase for Row 0: This part is similar to the first inbound 
phase. We aim to find 2 8 paired values that satisfy the differential path 
between #D3 SB [SC~ 1 (row(0 ))] and #D4 sn [rovj(0)\ which are described 
by yellow in Fig. 0 Again we only compute a single row. The difference for 
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8 bytes at #DA sn [row{{))\ is fixed to the same as #K4 SB [row(0)\ so that 
it can be canceled after the AK operation in the fourth round. For all 2 8 
differences in #D3 SC [0] , we compute the corresponding 8-byte difference at 
#D4 I [row(0 )], and check if solutions exist between the fixed #D& SB [row{0)] 
and computed #DA r [row(())\. After 2 8 trials, we expect to obtain 2 8 solutions 
on average. Finally, for 2 s solutions, we compute the corresponding 8 bytes 
at #D3 SB [SC- 1 (row(0))] and store them in a list L 2 . 

Merge Two Inbounds: One byte (in position 0) is overlapped in 8 bytes 
stored in Li and L 2 , hence we need to find the match. Both of value and 
difference need to match, and thus the probability of the match is 2 -16 . Be- 
cause 2 16 combinations of the results in Li and L 2 are available, we expect 
to find a match. We use the other 49 unfixed bytes at #D3 SB as freedom 
degrees for the outbound phase. Because it can produce 2 49 * 8 = 2 392 values 
for the outbound phase, finding one match is enough for this phase. 

The complexity for the inbound phase is 2 8 computations for both of the first 
and second inbound phases. A memory to store 2 8 state is required to generate 
L\ and L 2 . In summary, with 2 8 computations and a memory to store 2 8 state, 
up to 2 392 solutions of the inbound phase can be produced. 


Outbound Phase. Due to the inbound phase, the differential path is ensured 
to be satisfied up to the fourth round. The outbound phase is a brute force 
approach to satisfy the differential path after the fourth round by using solutions 
of the inbound phase. The only probabilistic event for the outbound phase is the 
cancelation of the difference at the final output. This occurs when the differences 
for #D7 SB [row(0)} is the same as #K7 SB [row(0)}. Therefore, by examining 2 64 
solutions of the inbound phase, we can obtain a collision at the final output. 

In summary, a collision is generated with 2 64 in time and 2 8 in memory. 


5.3 Extension to 8-Round Collision Attack and Other Variants 

The 7-round attack in Sect. 15.21 can be extended to 8 rounds. The differential 
path up to the 4th round is exactly the same as the one for the 7-round attack. 
Therefore, the inbound part is unchanged. In the outbound phase, 8 — >• 8 — > 64 
is replaced with 8 — > 1 — > 8 — > 64. The entire path is given in Eq.fljj). 

Because the attack procedure is very similar, we only mention the difference 
from the 7-round attack. To search for the key values, we use the Start-from- 
the-Middle approach. In this time, the differential propagation 8 — ? 1 needs 
to be satisfied probabilistically. Therefore, the complexity for the key schedule 
part is 2 56 in time and 2 8 in memory. Note that the complexity can be improved 
to 2 48 with the linearized match-in- the- middle technique |0|. Because this part 
is not the bottle-neck, we omit its detailed explanation. Also note that only 1 
result is enough because the data processing part can produce many solutions. 

For the data processing part, the inbound phase is exactly the same as the 
one for the 7-round attack, which requires 2 s in time and 2 8 in memory, and can 
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produce up to 2 392 solutions. In the outbound phase, the probabilistic events 
are the differential propagation 8 » 1 and the differential cancelation at 

the output state. Therefore, a collision for 8 rounds can be generated with a 
complexity of 2 56+64 = 2 120 computations and 2 8 state of memory. 

It seems worth mentioning that our differential path is an iterative form; 

Key: 64 8 ^4 1 8 64, 

Data: 0^8^±il^82±fo. 

Therefore, constructing a differential path for 4n rounds or 4(n — 1) + 3 rounds 
is possible. However, we cannot find the attack for three iterations (12-rounds 
or 11-rounds) due to a too high complexity and too small freedom degrees. 


Practical Near- Collision Attack on 7 Rounds. In some case, near- collisions 
can be a real threat because hash values are used after the truncation. Our 7- 
round attack in Sect. 15. 21 can generate a 40-byte near-collision with a complexity 
of 2 40 computations and 2 8 state of memory. For this attack, we only cancel the 
difference in 5-bytes between #K7 SB [row(0)] and #D7 SB [row(0)\. Note that 
the brute force attack for 40-byte near-collision takes 2 160 computations, and 
thus our attack is much faster. We also implemented the attack on a PC, and 
confirmed that the attack could work correctly. An example of the generated 
data is provided in Table 0 in Appendix Q 


Practical Collisions on 4 Rounds. All previous attacks require at least 
2 64 computations to generate a collision even for a small number of rounds. 
Therefore, we investigate the practical collision attack on a small number of 
rounds. 

Our differential path generates a local collision after the fourth round, and 
up to fourth round can be covered by the inbound phase. Therefore, we can 
generate collisions of the 4-round Whirlpool compression function only with 2 8 
computations and 2 8 state of memory. No extra practical example is given here 
since the 7-round near-collision in Table 01 is also a 4-round collision. 


5.4 Theory vs Practice: Implementation of Rebound Attacks 

The DDT of the S-box is the core of the rebound attack, which provides an 
efficient method for satisfying the differential paths. The S-box of Whirlpool is 
not as balanced as the one in AES. For a non-zero difference pair, if there is 
a conforming value, we call it a match. The matching probability of Whirlpool 
S-box is lower than the one in AES. 

The property of the Whirlpool S-box results in big differences between theory 
and practice. Theoretically, one valid key pair is enough to find a match of the 
MitM phase in the data processing part. But, practically, we tried 109 different 
valid key pairs to find a solution for the data part. In every matching step, we 
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have to try more times to find a match. So the complexity to find one solution 
is increased. However, the expected number of solutions for a random difference 
pair does not depend on DDT. Hence, the total complexity is not increased if we 
need many solutions of the inbound phase. As a result, the complexity of our 7- 
round and 8-round attacks is not affected, since the complexity mainly depends 
on a lot of iterations in the outbound phase. The theoretical complexity of our 
inbound phase for both key and data (to find a 4-round collision) is 2 s . Because 
we only need one solution from the inbound phase, experiments show that the 
practical complexity for the inbound phase is increased by 2 4 to 2 7 times. 

6 Concluding Remarks 

In this paper, we improved the attacks on Whirlpool with respect to the funda- 
mental security notions. For the preimage attack, the number of attacked rounds 
was extended by the guess-and-determine technique. Moreover, the complexity 
was improved by exploiting the freedom in the key value. For the collision attack, 
the difference was introduced in the key value, and a high probability differential 
path was constructed by canceling the difference in the data with the difference 
in the key. These results show several risks of using similar diffusions for the key 
and data. These also indicate that Whirlpool is still secure in practice. 
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A Chunk Separation for Preimage Attack 



H Backward Chunk | Forward Chunk | Guessing Bytes H Constants Q Unknown Bytes 

Fig. 6. Chunk separation ( b , r, w, g) = (7, 5, 1, 5) for the memoryless second preimage 
attack on 6-round hash function 

B On the Message Length Padding 

In order to convert the attack on the compression function into an attack on 
the hash function, we need to deal with the message padding first. For the 
last message block, the lower half are the message length in binary expression. 
Here, we use L to denote the message length. If the last bit of the fourth row 
#M[row( 3)] in the message block #M is 1, we can obtain that L = 255 mod 512. 
So the last 9 bits of #M[row( 7)] should be 011111111. If the last two bits of 
#M[row( 3)] are 10, we know that L = 254 mod 512. So the last 9 bits of 
#M[row( 7)] should be 011111110. So, we can calculate the probability that a 
random message block is a valid block with correct padding by adding up all the 
probability for different suffix of the upper half of the message block: 

256 

2~( 9 +0 ps 2 -9 . 
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Abstract. In this article we describe new generic distinguishing and 
forgery attacks in the related-key scenario (using only a single related- 
key) for the HMAC construction. When HMAC uses a fc-bit key, outputs an 
n-bit MAC, and is instantiated with an 1-bit inner iterative hash function 
processing m-bit message blocks where m = k, our distinguishing-R 
attack requires about 2"'' /2 queries which improves over the currently best 
known generic attack complexity 2^ 2 as soon as l > n. This means that 
contrary to the general belief, using wide-pipe hash functions as internal 
primitive will not increase the overall security of HMAC in the related-key 
model when the key size is equal to the message block size. We also 
present generic related-key distinguishing-H, internal state recovery and 
forgery attacks. Our method is new and elegant, and uses a simple cycle- 
size detection criterion. The issue in the HMAC construction (not present 
in the NMAC construction) comes from the non-independence of the two 
inner hash layers and we provide a simple patch in order to avoid this 
generic attack. Our work finally shows that the choice of the opad and 
ipad constants value in HMAC is important. 

Keywords: HMAC, hash function, distinguisher, forgery, related-key. 

1 Introduction 

Hash functions are among the most important basic primitives in cryptography. 
Informally, a hash function H is a function that takes an arbitrarily long message 
M as input and outputs a fixed-length hash value of size n bits. Classical security 
requirements are collision resistance and (second)-preimage resistance. Namely, 
it should be impossible for an adversary to find a collision (two distinct messages 
that lead to the same hash value) in less than 2"/ 2 hash computations, or a 
(second)-preimage (a message hashing to a given challenge) in less than 2" hash 
computations. 

Hash functions are used in many applications such as digital signatures, mes- 
sage integrity check and message authentication codes (MAC). A MAC is a 
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function that takes a fc-bit secret key K and an arbitrarily long message M as 
inputs, and outputs a fixed- length tag of size n bits. A MAC algorithm should 
also meet some security requirements. It should be impossible to recover the 
secret key except by exhaustive search, and it should be computationally impos- 
sible to forge a valid MAC without knowing the secret key, the message being 
chosen by the attacker (existential forgery) or not (universal forgery). 

MACs are crucial for many security systems and are often implemented with 
the HMAC Pj algorithm, in particular for banking protocols or protocols securing 
Internet connections (TLS and IPSEC). HMAC was designed by Bellare et al. in 
1996 and is now widely standardized. It has the property to use an iterative hash 
function as internal component (thus composed of an iterative application of a 
compression function) and a proof of security is given in |2j: HMAC is a pseudo- 
random function under the assumption that the compression function is itself a 
pseudo-random function. 

A trivial generic extension attack exists for HMAC: by asking for enough queries 
to obtain an internal collision, the attacker can then add extra message blocks to 
generate other colliding HMAC outputs, therefore breaking the existential forgery 
security criterion. In order to avoid this issue, many other MACs constructions 
have been proposed and analyzed |26I25I12| . reaching a security beyond the n/2 
birthday bound by using bigger hash function internal state sizes. For example, 
the extension attack applied to an n-bit hash function with a 2n-bit internal 
state requires 2" compression function calls. 

In parallel to the recent impressive advances on standardized hash function 
cryptanalysis, the community studied the possible impact on the security of HMAC 
when instantiated with these standards (such as MD5 [U3| or SHA-1 p3]). There 
have been also some related-key analysis of HMAC instantiated with real hash 
functions, but no generic attack is known in this model, i.e. without using any 
weakness from the internal hash function used. Note that the HMAC proof 121 only 
holds when considering a single-key scenario and says nothing in the related-key 
model. 

The cryptanalysts also looked at other attacks such as distinguishing-R and 
distinguishing-H CS!. The aim of the former is to distinguish between a random 
function and the HMAC construction, while the latter aims at distinguishing if 
the compression function used inside a HMAC construction is a random function 
or a specific compression function instance. It is widely believed that for the 
ideal narrow-pipe hash function, the distinguishing-R should require about 2 n / 2 
computations, while distinguishing-H should require about 2". 

Our Contributions. In this article we introduce a new type of related-key dis- 
tinguisher and forgery attacks for HMAC based on cycle length detection, requiring 
a birthday query complexity and only a single related-key. The attack complex- 
ities are summarized in Table Q together with previous work that analyzed the 
HMAC instantiating a dedicated hash algorithm. 

Our attacks work when the inner hash function is iterative (which is the case 
for almost all known hash functions, and is necessary for HMAC anyway) and 
when a special condition is met on the key input. This condition depends on the 
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value of the HMAC constants opad and ipad (which shows for the first time the 
importance in the choice of their values) and it is always fulfilled when the key 
length k is equal to the message input length m of the compression function. 
HMAC is defined to even handle cases where k > m and k = m is likely to happen 
for example with lightweight hash functions for which the total internal state 
size has to remain rather small. One can cite DM-PRESENT or H-PRESENT 0 hash 
functions (PRESENT being already an ISO standard |0|), which have respectively 
80 bits and 64 bits of message input for their compression function. Also, a 
block cipher-based hash function using a common mode such as DaviesMeyer or 
MatyasMeyerOseas [TJ instantiated with the standardized AES da is also likely 
to meet the condition k = m. 

We emphasize that this work is the first that exploits related-keys to attack 
HMAC when modeling the compression function as an ideal primitive. They are 
also the first attacks applying on HMAC and not on NMAC, which helps to un- 
derstand the security loss when going from the latter to the former. Finally, 
our attacks are still applicable even when the internal hash function has a big 
Z-bit internal state, unlike the known generic distinguishing or forgery attacks 
such as the extension attack. Note that many SHA-3 candidates are wide-pipe 
(like the finalists 114151241 1 and it is the current trend in hash functions designs. 
Therefore, this work shows that a wide-pipe hash function used in HMAC can be 
weaker than the one used in simple MAC constructions such as a secret-prefix 
MAC and its strengthened version LPMAC PDj- In these schemes, the key (and 
the message length) is simply prepended to the input message, and the hash 
value is the MAC value. Due to the double size of the internal state, no attack 
is known with a smaller complexity than 2" computations, while our attack on 
HMAC is more efficient, requiring only 2"/ 2+1 computations. 

After a description of HMAC in Section|21 we introduce the generic distinguishing- 
R attack (requiring about 2"/ 2+1 computations) in Section 0 basis for the the 
internal state recovery attack in Section^ the forgery attack in Section|5|and the 
distinguishing-H attack in Section 0 Finally, we discuss our results and propose 
a simple method to patch HMAC in Section 0 

2 Description of HMAC 

A Hash Function H is a function that takes an arbitrary length input message 
M and outputs a fixed hash value of size n bits. When the hash function is 
iterative (for example see the classical Merkle-Damgard construction nam), 
the message M is first padded and then divided into blocks m,; of m bits each. 
Then, the message blocks are successively used to update an Z-bit internal state 
cvi (where l > n) with a compression function h: cuj+i = h(cv l .m l ). and cv o is 
initialized to a fixed public value cv o = IV. Once all the message blocks have 
been processed, an output function g is applied to the last internal state value 
cvi so as to eventually obtain hash = g(cvi). The output function therefore 
transforms an Z-bit value into an n-bit one. 
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Table 1 . Summary of the attack complexities 


Previous attacks on HMAC with dedicated hash algorithm 


Attack 

Key Setting 

Target 

Size 

# Rounds 

Attack 

Ref. 

Dist.-H 

Single key 

MD4 

128 

Full 

2 lzl.o 

!5 

Dist.-H 

Single key 

MD5 

128 

33/64 

2 126.1 


Dist.-H 

Single key 

3-pass HAVAL 

128 

Full 

2 228.6 

15 

Dist.-H 

Single key 

4-pass HAVAL 

128 

102/128 

2 253.9 


Dist.-H 

Single key 

SHAO 

128 

Full 

2 121. 5 


Dist.-H 

Single key 

SHA1 

128 

43/80 

2 121.5 


Dist.-H 

Single key 

SHA1 

128 

50/80 

2 153.5 


Inner key rec. 

Single Key 

MD4 

128 

Full 

2 63 

9 

Inner key rec. 

Single Key 

SHAO 

128 

Full 

2 84 


Inner key rec. 

Single Key 

SHA1 

128 

34/80 

2 32 

El 

Inner key rec. 

Single Key 

3-pass HAVAL 

128 

Full 

2 122 


Pull key rec. 

Single Key 

MD4 

128 

Full 

2 95 


Pull key rec. 

Single Key 

MD4 

128 

Full 

2 77 


Dist.-H 

Single Key 

MD5 

128 

Full 

2 97 


Dist.-H 

Related Key 

SHA1 

128 

58/80 

2 158.74 

El 

New generic at 

tacks on HMAC 






Attack 

Key Setting 

Target 

Old Generic 

New Generic 

Reference 





Complexity 

Complexity 



Dist.-R 

Related Key 

Wide-pipe 

2 i/-z 

2 n/ ' 2+ 1 

This paper 


Dist.-H 

Related Key 

Narrow-pipef 

2 71 

2 n/ 2+1 

This paper 


Dist.-H 

Related Key 

Narrow or Widej 

2 n 

2 n /2+2 + 2 l- n+ 

1 This paper 


Inner state rec. 

Related Key 

Narrow or Widej 

2 n 

2 n/2+2 +2 l-n+ 

1 This paper 


Ex. forgery 

Related Key 

Narrow or Widej 

2 n 

2 n/ 2+2 + 2 l ~ n+ 

1 This paper 


For a wide-pipe 

s hash functioi 

a with 1-bit interi 

ial state, our 

attacks improve 

over the old 

gener 


complexity as long as l < 2n - 1. 


The MAC Algorithm HMAC 0 is based on the NMAC construction that uses 
two fc-bit keys K out and K in . NMAC replaces the public IV of a hash function 
H(IV,M) by a secret key K to produce a keyed hash function H(K,M). NMAC 
is defined by: 


mkC(K out ,K in ,M) = H{K ouU H{K in ,M)). 

Since in practice a hash function is used as a black-box and has a fixed IV, HMAC 
simulates the keyed hash function H(K, M) of NMAC by prepending a secret key 
block K to M, and computing H(IV, K | j M ) , where || denotes the concatenation. 
Also, HMAC uses a single fc-bit key K which is padded with zeros such that after 
padding the key length is equal to a multiple of to bits. For simplicity of the 
description and without loss of generality concerning our attacks, in the rest of 
this article we assume that the key can fit in one compression function message 
block k < to, and thus the length of the padded key is to bits (the notation of the 
keys therefore denotes the padded keys). K in and K out are defined by: K in = 
K © ipad = K © 0x3636 • • • 36 and K out = K © opad = K © 0x5C5C • • • 5C, 
where ipad and opad have the same length than a padded key. HMAC is defined 
by: 


HMAC (A, M) = H(IV, K © opad||ff (JV, K © ipad||M)). 
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Since the key padding in HMAC enforces that the first compression function call(s) 
handles all and only the key material, we can rewrite 

HMAC(A,M) = H K 9 ovad (H Kmvad (M)) = H Kout (H Kin (M )) 

where Hk{X ) represents the iterative hash function H for which the initial value 
is changed to h(IV, K). 

3 Generic Related-Key Distinguisher for HMAC 

3.1 General Description 

Before describing our attacks, we first emphasize that for the rest of the section 
we will only use small n-bit messages M, such that after padding any message 
fit into one compression function message input. In other words, |M||pad| = m 
and we will always compute a single compression function call in order to handle 
the whole message M. This is represented in Figured and we have 

HMAC (A', M ) = g(h(h(IV, K © opad), g(h(h(IV, K © ipad), M\\pad))\\pad)) 

= fK mt {fK in {M)) 

where fi<{X) = g(h(h(IV,K),X\\pad)). 

The general idea underlying our attacks came from the observation that, con- 
trary to the case of NMAC, in HMAC the inner and outer functions are not fully inde- 
pendent. Indeed, both inner and outer hash functions are the same function H . 
and the inner and outer keys are related by the relation Ki n @K ou t = ipad©opad. 

This is not an issue in the single key model, since when assuming the internal 
inner and outer compression functions as ideal, no information will leak on their 


K 

IV 



Fig. 1. The computation of HMAC with an iterated hash function when the padded 
message is small (|M||pad| = m) 
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output from this inner/outer key relation. However, in the related-key model 
the situation is different. When assuming that the key size k is equal to the 
padding size (thus one message block, i.e. k = m), then we can analyze what is 
happening when we query HMAC (K,M) and HMAC (K',M) with the related key 
K' = K © ipad © opad. For the first query the oracle will reply 

HMAC (if, M) = /K®opad(Wpad(M)) = f Kout (fK in (M)) 

and for the second query the oracle will reply 

HMAC (K',M) = /ic- eop ad(/^'®i P ad(M)) 

= /if®ipad(/rf®opad(M)) 

= fK in (f Kout (M)) 


One can easily see that the two oracles are doing the same computation, except 
that ipad and opad (or K m and K out ) are inverted. In other words, we have two 
oracles, one that applies fx in and then fK„, Jt (top figure below), and one that 
does the opposite fK out and then f Kin (bottom figure below). 



This non-random property seems not easy to detect since the functions fx in 
and fKo^t are parametrized with the secret key K, thus they are completely un- 
known to the attacker. However, it is possible to detect it using a cycle detection 
algorithm: the functions fK in ° /W ou , and fK out ° fK in have the same cycle struc- 
ture. Indeed, it is easy to see that there is a one-to-one correspondence between 
each cycle from f Kin o f Kout and f Kout o f K . n . 

The attacker will start from an n-bit random input message, query the first 
oracle (with key K), and keep querying as new message the MAC he just received. 
He continues to do so for about 2”/ 2 queries until he gets a collision among 
the MACs received. This collision in fact represents a cycle in the successive 
computations of fK in ° }k ou1 and this first phase defined a first walk that we 
denote walk A. In a second step the attacker finds also a cycle for the second 
oracle computations (with key K' = K © ipad © opad) , i.e. for fK out ° fi< in and 
that defines walk B. Finally, since the number of MACs obtained from the first 
and second oracle is big enough, there is a good chance that there is a collision 
between a MAC from walk A and an internal value of a MAC from walk B (the 
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internal value is the output of the first hash in HMAC) . If so, then the cycle length 
of the two cycles are necessarily the same since they follow exactly the same 
computation path starting from the collision. This is depicted in Figure |3 An 
attacker can use this criterion to distinguish between HMAC computations and 
a randomly chosen function, since in the latter case there is only a very low 
probability that the two cycles have the same length. We call the tail the part 
of the walk that does not belong to the cycle and we denote Za (resp. Z B ) the 
point where the tail enters the cycle for walk A (resp. walk B) . 



3.2 The Distinguisher 

Let T r r l be the set of functions from n bits to n bits. We denote Fk and Fk> 
the two oracles on which the adversary A can make queries. The oracles are 
instantiated either with Fk = HMAC^ and Fr> = HMAC^-/ (with K being a 
randomly chosen k - bit key and K' = K CD ipad© opad) or with two independent 
randomly chosen functions Rk and R,k' from J 7 ". The goal of the adversary is 
to distinguish between the two cases and its advantage is given by 

Adv(A) = |Pr[^(HMAC if ,HMAC if /) = 1] - Pt[A{R k ,Rk') = 1] | 


1st Phase (Walk A). The attacker first chooses a random small message Ma 
of size n bits and initializes = Ma- Then, he will query Fk(Qq) and store 
the value obtained in q^. He continues by querying Fk{<1\) and by storing the 
answer in q etc. for 2"/ 2 + 2"/ 2-1 iterations. If he observes a collision among 
the queries during the process, the attacker stops. If no collision is found or if 
the collision occurred in the 2"/ 2 first queries, the attacker outputs 0. 


2nd Phase (Walk B). This phase is identical to the first phase, except that 
the attacker queries the oracle Fk> instead of Fk- We denote qf the queries 
asked during this phase and Mb the starting message value. 
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3rd Phase (Cycle Detection). Since each query is obtained by applying the 
function Fk (or Fk> ) on the previous query, a collision among the qf (or among 
the qf) naturally defines a cycle. If the cycle length of set A is equal to the cycle 
length of set B. the attacker outputs 1, otherwise he outputs 0. 

3.3 Complexity and Success Probability 

1st and 2nd Phases (Walk A and B). We first compute the probability that 
no collision is found when asking for the first 2"/ 2 queries in the first (or in the 
second) phase. In the case of randomly chosen functions: 

2 n/2 f 2 n / 2 

Pnc-rand = 1 - ^ ~ ]J e~ + = e -2” /2 - (2 n / 2 - 1)/2«+» ^ e ~l/2 

In the case of HMAC computations, a collision can occur either because of a 
collision on fx in or because of a collision on fK out ■ Therefore, we have 

- ( n >-£)« (n .-*) - 

Then, we compute the probability that when querying the 2”/ 2-1 remaining 
elements, a collision will eventually be found in the first (or in the second) 
phase: 


2"/ 2 -( 


- 2 n 1 - 


Pc-rand=l~ ]^[ 

^ ^_2 n / 2- 1 /2 n / 2 -\-2 n / 2 ~ 1 .(2 n / 2 ~ 1 — l )/2 n ^” 1 ^ ^ 


Again, in the case of HMAC computations, a collision can occur either because of 
a collision on f Kin or because of a collision on fK out - Therefore, we have 



_ ( e -2 n / 2 /2"/ 2 +2 n / 2 -(2 n / 2 -l)/2 n + 1 )2 ^ i_ e -i 


To summarize, the probability of the attacker to not output 0 during both the 
first and second phases is equal to (Pnc-rand. • P c -rand ) 2 — 0.079 with randomly 
chosen functions and to (P nc -hmac ■ Pc-hmac ) 2 — 0.122 with HMAC. 


3rd Phase (Cycle Detection). We need to compute the probability that the 
cycle found in walk A and in walk B have the same length, for both the HMAC 
case and the randomly chosen functions case. We denote P c i-hmac the former 
and Pci— rand the latter. 
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When the oracles are instantiated with HMAC, we already explained that HMACx 
and HMAC a;/ are related by their cycle structure. If there exists a collision between 
a member of walk A and an internal value of a member of walk B, then we are 
ensured that they will enter a cycle of the same length and the attacker will 
output 1. Thus Pci—hmac is the probability that such a collision occurs. Since 
the first phase (resp. second phase) ensured that a collision occurs after 2"/ 2 
queries, we are ensured that at least 2 n / 2 distinct elements exist in walk A (resp. 
walk B). Therefore, the probability Pd-hmac is lower bounded by 

2 n/2 «„/ 2 2-/ 2 

Pd-hmac > 1 - n (* - - ^r) = 1 n e 3571 = 1 - e_1 - 

Now we need to evaluate the probability P c i-rand that the cycles in walk A and 
walk B have the same length for randomly chosen functions. Since we ensured 
that the collision happens in the last 2"/ 2_1 elements instead of the first 2 n / 2 
elements for walk A, there must exist some value za, 1 < za < 2"/ 2-1 , such 
that q £ n / 2+za is the first query colliding with some previous query in walk A. 
So the cycle length of walk A is uniformly distributed between 1 and 2 n / 2 + 2,4. 
Similarly for walk B, there exists a value zb, 1 < Zb < 2”/ 2-1 , such that the 
cycle length of walk B is uniformly distributed between 1 and 2"/ 2 +2 b. Without 
loss of generality, let za be smaller than or equal to zb • Thus, the probability 
that the cycles in walk A and walk B have the same length is given by 


p \ " 1 x 1 < 1 x 1 = o-"/ 2 

2 n / 2 + za 2”/ 2 + zb 2"7 2 2”/ 2 + za 

Overall the advantage of the adversary is 

Adv(A) = |Pr[A(HMACar,HMAC^) = 1] - Pi[A(Rk , Rk>) = 1]| 

> | (Pnc—hmac * Pc-hmacf • Pcl-hmac ~ (Pnc-rand • Pc-randf * Pcl-rand\ 

~ (e- 1 ■ (1 - e- 1 ' 5 )) 2 • (1 - e- 1 ) = 0.077 

and it can be increased towards (1— e _1 ) = 0.63 by allowing the attacker to spend 
a bit more computations in the first and second phases (instead of outputting 0, 
he just starts the phase over until he succeeds). 

The complexity of the distinguisher is about 2”/ 2 + 2"/ 2-1 computations for 
each of the first and second phase, thus about 2”/ 2+1 computations in total. 

As a proof of concept, we have implemented the attack for HMAC instantiated 
with SHA-2 truncated to 32 bits and the results can be found in the full version 
of the article. 


4 Internal State Recovery Attack 

In this section we extend the distinguisher from Section 0 and we present an 
internal-state-recovery attack that will be useful for the latter sections show- 
ing forgery and distinguishing- 7/ attacks. These attacks are applicable to both 
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narrow-pipe and wide-pipe hash functions under some conditions. As an exam- 
ple for a narrow-pipe hash function without finalization <?(•), i.e. SHA-256 and 
SHA-512 |2U, these attacks achieve a birthday-bound complexity 2"/ 2 , thus sig- 
nificantly reducing the expected complexity of 2". 


4.1 General Idea 

We observe that if walk A and walk B follow the structure in Figure |2I then for 
any query in the cycle of walk A, denoted as q A , the inner hash value Hx in [q A ] 
is necessarily equal to some query in the cycle of walk B, denoted as q B . The goal 
is therefore to find this query among all candidate values (all the members 
of walk B that belong to the cycle). In other words, we would like to synchronize 
the two cycles from walk A and walk B, which we already know have the same 
length. 

In general, even if we know that walk A and walk B have the same length 
and are actually doing the same computations, it seems hard to synchronize the 
two cycles because we do not know where the tail in walk A and in walk B is 
entering the cycle. However, in the special case where the collision between walk 
A and walk B happens in the tail (and not in the cycle), then we know that the 
tails are entering the cycle at the same position (see Figure El). In that case, the 
cycles are directly synchronized and the attacker knows all the successive hash 
output values for every computation in the cycle (he knows the output values of 
all the H Kin and H Kout computed inside the cycle). 

The first and second phases of the attack will be devoted to building a walk 
A and walk B with a rather long tail, such that during the third phase there is 
a good chance to get a collision between an element of the tail of walk A and an 
element of the tail of walk B. In order to recover an internal state, he will focus 
on one randomly chosen value belonging to the cycle, denoted q A , and its next 

► HMACk HMACk’ 


unsynchronized cycles synchronized cycles 

Fig. 3. Two walks A and B colliding and sharing a cycle. The left example shows 
unsynchronized cycles (the collision happens in the cycle, thus Za A Zb), the right 
shows synchronized cycles (the collision happens before the cycle, in the tails, thus 

Za = Z B ). 
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hash output q B , with q B = H(Ki n , q A ). Then he will try to guess the internal 
hash value X = h(h(IV, K in ), q A \\pad\) that led to q B , i.e. g(X) = q B . 

We assume that g(-) is easy to invert (given an output u, it is easy to find 
all preimages leading to u) and that it is balanced (given an output value, there 
exists 2 l ~ n corresponding input values through g). Inverting g provides 2 l ~ n 
candidates X it such that <?(Xj) = q B . For each of these candidates, we will apply 
a filter to remove the bad guesses. The filter is based on an offline extension of 
the computation of HK in ■ 


4.2 Detailed Procedure 

1st Phase (Walk A). The attacker chooses a random small message Ma of 
size n bits and initializes q A = Ma- Then he will query HMAC^gjf) and store 
the value obtained in q A . He continues by querying HMACk(^), and by storing 
the answer in q A +1 for i = 0, 1, ... , 2"/ 2 . If no cycle is generated (no collision 
among the queries q A ) or if the walk A generated has a tail smaller than 2"/ 2-2 , 
then the attacker chooses another random n-bit message as starting query q A 
and repeats the search procedure until a walk A with a cycle and a tail of at 
least 2"/ 2-2 elements are found. 

We evaluate the success probability of finding a proper walk A by trying 
one set of 2 n / 2 iterative queries. First we would like the first 2"/ 2-1 elements be 
distinct and the probability of this event is approximately e _1//8 (the evaluation is 
similar to the one from Sectional thus we omitted it here). Then the probability 
that the last 2"/ 2-1 queries produce a cycle is approximately e -3 / 8 . We evaluate 
the probability that the tail of walk A has at least 2"/ 2 — 2 elements. Note that 
we have guaranteed that the query q A causing the first collision happens during 
the i-th iteration, with i > 2"/ 2 — 1. Therefore, the probability that q A does 
not collide with the first 2 n / 2 — 2 elements is 1 — (2 ™/ 2-2 /i) > 1/2. Finally, we 
conclude that by trying one set of 2 n / 2 iterative queries, the success probability 
of generating a proper walk A is at least e -1 / 8 x e -3 / 8 x 1/2 ~ 0.303. 


2nd Phase (Walk B). The procedure is identical to the first phase except that 
the attacker is querying HMACif/ with K' = K ® ipad© opad instead of HMAC^- 
He obtains a walk B that has a cycle and whose tail contains at least 2"/ 2-2 
elements with probability of about 0.303 (identical to 1st phase). 


3rd Phase (Collision). The attacker checks that there is a collision between 
an element from walk A and one from walk B, which can be done by verifying 
that walk A and walk B have the same cycle length. He also wants this collision 
to happen more exactly between a member of the tail of walk A and a member of 
the tail of walk B. This event happens with probability 1 — e _1 ~ 0.63 and if such 
a collision occurs, then the cycles from walk A and walk B are synchronized. In 
other words, the attacker knows that the tail in walk A entered the cycle at the 
same position that the tail in walk B entered its own cycle and as a consequence 
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he knows all the succesive internal values for the HMAC #- and HMAC #-/ computa- 
tions belonging to the cycle. We denote q A , q B and q c three consecutive internal 
states, that is q B = H(K in ,q A ), q c = H(K out ,q B ) and q c = HMAC#:(g A ). 

4th Phase (Recovery by Filtering). Given q A , q B and q c , known by the 
attacker, the goal is now to recover the inner hash function internal state just 
before applying the output function g. In other words, the attacker is trying 
to recover X = h{h{IV, Ki n ), q A \\padi), with g(X ) = q B . He first inverts the 
output function g from q B and gets 2 l ~ n candidate values Xj . 

The attacker chooses 2”/ 2 random distinct messages Af, : , 0 < i < 2”/ 2 , such 
that each q A \\padi\\Mi\\pad ,2 fits into exactly two message blocks. He queries the 
messages q A \\pad\\\Mi to HMAC #- and look for collisions among the outputs. A 
collision happens in inner hash with a probability 1 — e -1 / 2 . At the same time, we 
want to avoid faulty collision, i.e. collision in the outer hash instead of the inner 
hash, and this happens with probability e -1 / 2 . We denote ( M,M ') the pair of 
colliding message found and the success probability is (1 — e -1 / 2 ) x e -1 / 2 ~ 0.23. 

For each of the 2 l ~ n candidate values Xj, the attacker computes the values 
g(h(Xj, M\\pad 2 )) and g(h(Xj , M'\\pad 2 )), and checks whether they are equal. If 
it is the case, the attacker stores Xj as a very likely candidate for the yet unknown 
value of X. Since there are in total 2 l ~ n candidate values, and the filter is of n- 
bit, 2 l ~ 2n candidates will be stored. The attacker repeats the colliding messages 
(M, M') search and the filtering process until only one candidate, namely the 
real value of X, is left. 

Overall, the complexity of the attack is less than 2”/ 2+2 queries, and 2 l ~ n+1 
offline computations. The success probability is around 0.303 x 0.303 x 0.63 x 
0.23 = 0.013. By repeating the phases from 2 and 4 several times, the success 
probability will be increased. 

5 Forgery Attacks 

This section describes the related-key forgery attacks on HMAC. The adversary is 
given access to two oracles HMAC#: and HMAC#:/ = (#-0 lpad 0 Opad ). After interacting 
with HMAC#: and HMAC#-', he outputs a message and MAC value ( M,a ), such 
that the message has not be queried for HMAC#:. If cr is a valid MAC value for 
M through HMAC with key K, the adversary is said to have successfully forged 
M for HMAC#:. More precisely, when the attacker is free to choose M it is an 
existential forgery, while if the message is fixed by the challenger beforehand it 
is a universal forgery. 

A commonly known generic existential forgery attack on HMAC (even in the 
single-key setting) is the so-called extension attack. The attacker first searches 
for a pair of messages (M, M') colliding on the last Z-bit internal state of the 
inner hash (just before the application of the output function g in the inner hash 
function call), then appends each of them with the same additional message block 
X. Since the last internal state is the same for both messages (M, M'), the two 
computations of this extra message block X will also behave identically. Finally, 
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by querying the HMAC value for one of the two message M\ \X . the attacker directly 
forge the other one M'\\X by outputting the same MAC value. The complexity 
of this existential forgery attack is around 2 */ 2 queries. 

We extend the internal-state-recovery attack from Section 0 to an existential 
forgery attack. The method is simple. Following the procedure in Section 0 the 
attacker first recovers the internal state X during the HMAC/^ computation of one 
of the n-bit messages queried and we denote this message by M. Then, using 
about 2 l / 2 computations, he generates offline a pair of distinct messages M' and 
M" of the same length satisfying g(h(X, M'\\pad, 2 )) = g(h(X, M"\\pad, 2 )), where 
pad .2 stands for the padding appended to the message M\\M' (or M\\M") when 
applying the hash function H. Finally, the attacker queries M\\pad\\\M' to the 
oracle HMAC k and receives a value T' , where pad\ stands for the padding added 
to the message M when applying the hash function H. He can forge the MAC 
value T" for the message M\ \pad\ 1 1 M" through HMACif since T" = T' . The overall 
complexity of this attack is 2"/ 2+2 queries and 2 l ~ n + 2 1 / 2 computations. Note 
that in particular for the case l < 2n, our attack is faster than the commonly 
known existential forgery attack requiring 2 l / 2 computations. 

One can trivially extend this existential forgery attack to an ” almost- universal” 
forgery attack, where the attacker can only choose the first block and the 1/2 
first bits of the second block of the message to be forged. In practice, this would 
be very close to a universal forgery if one assumes that a few bytes of data in 
the header of the messages to be MACed can be controlled by the attacker. 

6 Distinguishing-H Attacks 

This section proposes two distinguishing-H attacks in the related-key setting. 
Let X™ +n be the set of functions from m + n bits to n bits. The attacker is 
given access to two oracles HMAC/c and HMAC^' with K' = K CD ipad© opad. The 
compression function of the HMAC oracles is instantiated either with a known 
dedicated function h or with a random chosen function r from J-" l+rl , which 
we denote (HMAC^ , HMAC^-,) and (HMAC^- , HMAC^, ) respectively. The goal of the 
adversary is to distinguish between the two cases and its advantage is given by 

Adv(A) = |Pr[M(HMAC^,HMAC^,) = 1] - Pr[M(HMAC^, HMAC^,) = 1]| . 


6.1 Distinguishing-H Attack I: Comparing Cycles Lengths 

The distinguisher in Section 0 can be extended to a distinguishing-H attack, as 
long as the finalization g(-) is bijective and invertible, for example the identity 
function. Without loss of generality, we omit the output function g. The only 
difference from the distinguisher in Section [3 will be that in order to produce 
walk A and walk B we will make full-block long iterative queries, namely m-bit 
queries, instead of n-bit queries. A graphical view of one iteration in a walk is 
given in Figure 0 Let padi be the padding to an n-bit message and pad ,2 the 
padding to an m-bit message. The attacker first chooses a small random n-bit 
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value q A . He then queries q A j \pad\ to HMAC k and receives X a . He computes 
h(Xo,pad,2 ) offline and stores the output as q A . He continues to query qf \ \padj , 
receive Xi and apply h(Xi, pad-2) offline to produce ?4i- With the same pro- 
cess, the attacker produces walk B, except that he queries HMACjf/ instead of 

mkC K - 

If HMAC oracles are instantiated with h, then h(HMACif {-),pad2) is /x in o fK out 
and h(lMAC' K (-),pad2) is fK out ° where fK in and fK out are defined in Fig- 
ure 0 So walk A and walk B have a good chance to have the structure explained 
in Section 0 and depicted in Figure [21 leading to cycles of equal length. On 
the other hand, if HMAC oracles are instantiated with r, walk A and walk B are 
independent. Thus by detecting the cycles lengths, the adversary can distin- 
guish (HMAC^, HMAC^,) from (HMAC^- , HMAC^, ) . The complexity and the success 
probability are identical to the ones for the distinguisher in Section [3 



Fig. 4. Distinguisher-H attack I 


6.2 Distinguishing-H Attack II: Recovering Internal State 

The internal state recovery attack in Section|l|can be extended to a distinguishing- 
H attack as well. The adversary first regards the HMAC oracles as (HMAC^, HMAC^,), 
and applies the internal state recovery procedure from Section0|to obtain an inter- 
nal state value X of some n-bit query q A in a walk. Then he searches offline a pair 
of distinct messages (M, M') satisfying g(h(X, M)) = g(h(X, M')), which costs 
2"/ 2 computations. Finally, he queries HMACr: with q A \ \pad\\ \M and q A \ \pad2\ \ M' 
to check whether the two MAC values collide. If they do the attacker outputs 1, 
otherwise he outputs 0. 

If the compression function is h, the probability that RHkCK{q A \\padi\\M) col- 
lides with HMACx (q A \\padi\\M') is equal to the success probability of recovering 
X in the attack of Section 01 If the compression function is r, the probability 
that HMACif(g j4 ||padi||M) = HMACx {q A \ \padi \ \ M') is negligible. 

Overall, the complexity is 2 n / 2+2 queries, 2 l ~ n+1 + 2”/ 2 offline computations 
and the success probability is 0.013. 
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7 Patching HMAC and Discussions 

We emphasize again that the related-key issue depicted in this article only exists 
when the attacker can query fK„. u , ° fK in and fK in ° I k„, lL with related-key rela- 
tions, and therefore keep the two computation chains synchronized if a collision 
happens. In the case of HMAC this is possible only when k = m or k = m— 1 
since the last bit of ipad and opad are equal (otherwise, for a smaller key the 
attacker can not build a proper related-key). This shows that the choice of 
ipad and opad is not anecdotal. For example, if ipad and opad were very 
similar, then our attacks would work for basically any key length. Also, we ob- 
serve that our attacks are the first to apply to HMAC and not to NMAC, thus helping 
the community to understand what security we loose when going from NMAC to 
HMAC. 

Even if our attack is only theoretical due to its high birthday complexity, it 
is interesting to study how one can patch the scheme and avoid this related-key 
issue. Since one of the best feature from HMAC is that it uses a hash function 
as a black box, without any need to change the primitive implementation, our 
goal is to find a patch that does not affect the hash function definition. Indeed, 
an easy and efficient tweak would be for example to force different IVs for the 
inner and outer instances of H in HMAC, but that would require modifying H’s 
implementation. We note that truncating the output of HMAC would also work 
(the attacker would have to successively guess the truncated bits for each received 
query in order to continue the computation chain), but we do not consider this 
solution as satisfactory because reducing the output length will directly reduce 
the expected generic security of the MAC algorithm. 

A first try could be to xor some distinct constants to the inner and/or outer 
hash message input in an attempt to separate the fK out and /^computations. 
However, with such a patch, an attacker can adapt his query strategy and still 
perform a modified version of the attack from Section 0 to maintain the compu- 
tation chains synchronized. 

Our proposed solution is instead to force an extra fixed bit (or byte) before the 
input message M. This patch would not harm much the efficiency of the scheme 
since only one bit (or one byte) would be added to the message to hash for the 
inner hash function call (actually the efficiency will be the same if the message 
plus one bit still fit in the same number of message blocks). Also, this patch 
can even be applied on top of HMAC, as a preprocessing phase before calling the 
primitive, thus allowing to use existing HMAC libraries without having to modify 
them. 

The related-key distinguishing-R attack from Section 0 is thwarted because 
now the inner and outer function are made distinct, even when querying with 
keys K and K' = AT© opad© ipad. The attacker can no more adapt the queries to 
circumvent this countermeasure and keep the computation chains synchronized. 
The security proofs of HMAC still hold with this patch since it is trivial to see that 
any attack on this new proposal will also apply on HMAC. 

Note that adding this extra bit (or byte) to the input of the outer hash 
function instead of the inner one, in an attempt to not reduce the efficiency 
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(in most cases the hash function output size n is much smaller than its message 
input size m and fit in one block, thus the efficiency would actually be very likely 
to remain exactly the same), would not prevent the attack from Section [3 to be 
applicable, since the attacker could simply adapt his query strategy: instead of 
getting a value V from the HMAC oracle and then query this value V again etc., 
he could simply prepend a 0 to the received query 0||F before querying it again 
and eventually get the K and K' computations synchronized again. 

We observed that appending or prepending the extra bit to the message have 
actually different impact on the security. For the former, the distinguishing-H 
attack (approach I) from Section El can still apply in the case of a narrow pipe 
internal hash function, while for the latter the attacker can no more play with 
pad ,2 to absorb the prepended bit. Thus, our final proposal is to simply 
prepend a 0 bit (or byte) to the input message of HMAC . Namely, this 
new version HMAC’ would be defined as 

HMAC'(K,M) = Hk 9 oMHk 9 ±MO\\M)) = H Kout {H Kin {Q\\M)) = HMAC(A,0||M) 

Taking in account the fact that the related-key attacks described in this article 
only work for special key length, we propose to apply our patch to HMAC only 
when k = m or k = m— 1. 

We leave as an open problem to find a patch that has no impact on the 
efficiency (not even a single bit), without modifying the implementation of the 
hash function H (thus without using distinct tVs for the outer and inner hash 
calls). 

As a final remark, we observe that for HMAC one should only consider related- 
keys of the same length than the original key. Indeed, for HMAC one can easily 
check that when the length of the key K is not a multiple of m, then the key 
K' = K\\Q is equivalent to K in the sense that HMACrt(M) = HMAC k'{M) for 
any message M (this related-key relation is even valid in the formalization of 
related-key attacks from Bellare and Kohno 0 since no two different keys have 
the same related-key). This is due to the fact that the padding of the key (so that 
its length becomes a multiple of m) is weak and do not distinguish between keys 
of different length. A possible patch in order to avoid any equivalent key would 
to simply pad the key with a 1 and as many zeros as needed (possibly none) 
such that K\ | 10 ... 0 is a multiple of m, instead of the original 0 ... 0 padding. 

8 Conclusion 

In this article we introduced a new type of distinguishing-R, distinguishing-H, 
internal state recovery and forgery attacks for HMAC in the related-key setting. 
While the applicability of this attack is only theoretical, it uses a novel attack 
angle, the cycle length, ft is the first attack that applies on HMAC and not on 
NMAC and it provides a better understanding of the role of the constants ipad 
and opad. We also showed that our attacks can be avoided with a simple patch 
that only prepends 1 bit or 1 byte to the head of a message. 
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Abstract. The “five-card trick” invented by Boer allows Alice and Bob 
to securely compute the AND function of their secret inputs using five 
cards — three black cards and two red cards — with identical backs. This 
paper shows that such a secure computation can be done with only four 
cards. Specifically, we give a protocol to achieve a secure computation 
of AND using only four cards — two black and two red. Our protocol is 
optimal in the sense that the number of required cards is minimum. 


1 Introduction 

Assume that two honest-but- curious players Alice and Bob, who hold secret bits 
a € {0, 1} and b e {0, 1}, respectively, wish to securely compute the AND func- 
tion, that is, they want to learn the value of a A 6 without revealing more of their 
own secret bits than necessary. The “five-card trick” invented in 1989 by Boer j2J 
achieves such a secure computation of AND using five cards Now, 

after over two decades since the invention of the five-card trick, this paper im- 
proves upon the result: we show that the same secure computation can be done 
using only four cards [x][jj^][^][^] . 

This paper begins with an overview of the five-card trick. 

1.1 The Five- Card Trick 

The “five-card trick” by Boer |2j is an elegant secure AND computation protocol 
that uses three [x]s and two [^]s. Before going into the details of the protocol, 
we first mention the properties of cards appearing in this paper. 

All cards of the same type ([♦] or [t] ) are assumed to be indistinguishable 
from one another. We use [?] to denote a card lying face down. We also assume 
that the back [?] of each card is identical. To deal with Boolean values, we use 
the following encoding: 

1 * 1 ^ = 0 , @®= 1 - ( 1 ) 

Given a bit x e {0, 1}, a pair of face-down cards [?][?] whose value is equal to 
x (according to the encoding rule JU above) is called a commitment to x, and 
is expressed as 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 598-gUU] 2012. 

(c) International Association for Cryptologic Research 2012 
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We now explain how to play the five cards in Boer’s secure AND protocol. First, 
given two cards [♦][)?] out of the five cards, Alice privately makes a commitment 
to her secret bit a (without Bob’s knowing the order of the two cards); similarly, 
Bob makes a commitment to the negation 6 of his secret bit b. Then, with the 
remaining one card [jfrj , two commitments are put forth as follows: 

mmm- 

a b 

It should be noted that the three cards in the middle would be [♦][♦][♦] only 
when o = 6=l (if the second and fourth cards from the left were turned over). 

Next, Alice and Bob turn the centered card [♦] face down, and apply a random 
cut, which is denoted by (■): 

mmr?r?T?i -*■ ( i?i?i?i?i?i ) -+ tiwi 


A random cut (also called a random cyclic shuffling) means that, as in the case 
of usual card games, a random number of leftmost cards are moved to the right 
without changing their order (of course, the random number must be unknown 
to Alice and Bob); to implement this, it suffices that Alice and Bob take turns 
cutting the deck until they are satisfied. Finally, Alice and Bob reveal all five 
cards. Then, the resulting sequence is either 

1*1*11^1*1 Of F1I*F1*1I*1 (2) 

apart from cyclic rotations, where either the three [$]s are “cyclically” consec- 
utive or not. One can easily verify that the former case implies a A 6 = 1, and 
the latter case implies a A 6 = 0. 

This is the five-card trick, a simple and elegant secure AND protocol. 


1.2 Our Result and Related Work 


In this paper, we reduce the number of required cards by one, compared to the 
five-card trick, as listed in Table Q That is, given commitments 


to Alice’s bit a and Bob’s bit 6, our protocol needs no card other than the four 
cards constituting the two commitments, i.e., it can securely evaluate the value 
of a A 6 without the use of any additional card. Therefore, as long as one adopts 
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Table 1. The five-card trick and our protocol with their performance 


o Secure AND in a non- committed format 



# of card types 

# of cards 

Boer p (ill .11 

2 

5 

Ours m 

2 

4 


the encoding rule ( 0 , our protocol is optimal in the sense that the number of 
required cards is minimum because at least four cards are necessary for the two 
inputs a and b. 

Since the invention of the five-card trick, there have been several card-based 
protocols for secure computation, as listed in Tabled All these protocols produce 
their output (say, a A b) in a committed format, i.e., their output is described as 
a sequence like 



that follows the encoding rule (0 (and Alice and Bob have no knowledge about 
the value than their own secret bits). In contrast, the five-card trick and our 
protocol (given in Section 0 output the value of aAb in a non- committed format, 
the format of the output a A b differs from the format of inputs a and b, namely 
the encoding rule 0 (recall the resulting sequences 0 . which are completely 
revealed to the public at the end of the protocol). 


Table 2. The “committed format” protocols 


o Secure AND in a committed format 



# of card types 

# of cards 

avg. # of trials 

Crepeau-Kilian 0 

4 

10 

6 

Niemi-Renvall 0 

2 

12 

2.5 

Stiglic P2| 

2 

8 

2 

Mizuki-Sone 0 

2 

6 

1 


o Secure XOR in a committed format 



# of card types 

# of cards 

avg. # of trials 

Crepeau-Kilian 0 

4 

14 

6 

Mizuki-Uchiike-Sone 0 

2 

10 

2 

Mizuki-Sone 0 

2 

4 

1 


Thus, all the card-based protocols are categorized into two types: 
“non-committed format” protocols (Table 0 and “committed format” protocols 
(Table 0 ; this paper addresses the former. Note that in Table 0 every protocol 
whose average number of trials is more than 1 is a Las Vegas algorithm. 

While card-based protocols might fall within the area of cryptography without 
computers 0, recreational cryptography 0 or human-centric cryptography 0, we 
believe that this type of research will help professional cryptographers intuitively 


The Five-Card Trick Can Be Done with Four Cards 601 


explain to nonspecialists the nature of their constructed cryptographic protocols 
(e.g. |2|). That is, card-based protocols would help ordinary people understand 
what secure computations are, or, more fundamentally, what cryptography is. 
Furthermore, it should be noted that some of the card-based protocols are im- 
plemented and used in online games |B| ■ 

The remainder of this paper is organized as follows. In Section 0 we give 
a description of our four-card secure AND protocol. In Section 0 we show the 
correctness of our protocol, that is, we prove that our protocol securely computes 
the AND function. This paper concludes in Section 0 with an open question. 

2 Description of Our Protocol 

In this section, we design a new card-based protocol that securely computes the 
AND function using only four cards [jfr| || G |[^| . 

In Section ETT1 we first introduce the “random bisection cut” 0 used in our 
protocol. We then describe our protocol in Section 12.21 

2.1 Random Bisection Cuts 

As seen in Section II .11 applying a random cut to a sequence of face-down cards 
results in a sequence such that a random number of leftmost cards are moved to 
the right without changing their order. Whereas, a “random bisection cut” 0 
works differently, as follows. 

Given a deck of (an even number of) face-down cards, say 

Mil 

a b 

bisect it and randomly switch the resulting two decks; such a card shuffling 
operation is called a random bisection cut. For the example above, a random 
bisection cut, denoted by [■ || • ], works as 

00100 - [00100] - 0000 

a b 

where the resulting deck of the four cards is either 



a b b a 


and each case occurs with probability of exactly 1/2. 

Although at first glance a random bisection cut seems to be a little bit less 
natural operation compared to a (normal) random cut, we hope that people will 
feel a random bisection cut to be an easy-to-implement operation some day. If 
Alice and Bob are not familiar with playing cards, then they may hold each 
of the two bisected decks together using a clip before shuffling the two decks. 
Alternatively, they may put each of the two decks into an envelope (without 
changing the order of the cards), and shuffle the two envelopes. 
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2.2 The Protocol 

Given a commitment to Alice’s bit a and a commitment to Bob’s bit b , our 
four-card AND protocol proceeds as follows. 

1. Apply a random bisection cut: 

mmmm - imwmm] - 00010 

a b 

2. Apply a random cut to the two cards in the middle, namely the second and 
third cards: 


mmm - 0(00)0 - mmm 

3. Reveal the second card. 

(a) If the face-up second card is [♦], then open the fourth card. We now have 
either 

mmm « mmm 

The former case implies a Ab= 1, and the latter case implies a A b = 0. 

(b) If the face-up second card is [^], then open the first card. We now have 
either 

1*1000 or 0000 

The former case implies a A b = 0, and the latter case implies a A b = 1. 

As described above, our protocol makes one random bisection cut (in step 1) and 
one random cut (in step 2). After those cuts, two cards are eventually revealed, 
namely either (a) the second and fourth cards, or (b) the first and second cards, 
depending on the result of revealing the second card in step 3. Note that if the 
two face-up cards are the same type ( 1*1*1 or [g][g]), then we have a A b = 1; 
otherwise, we have a A b = 0. 

We show why our protocol works in the next section. 

3 Correctness of Our Protocol 

In this section, we prove that the protocol given in the previous section securely 
computes a A b. First, in Section 13.11 we intuitively explain why our protocol 
works. Then, we verify the correctness of our protocol in Section 13.21 


3.1 An Intuitive Sketch 


Given a commitment 
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note that each individual card constituting the commitment inherently has the 
value of the bit a, that is, one can also write 


0 0 


where an encoding rule for a single card is taken: 0 expresses 0, and [^] ex- 
presses 1. Based on such a single-card encoding, commitments to a and 6 can be 
expressed as: __ 

0 0 0 0- (3) 


Now, for the expression Q, skip step 1 in our protocol and directly apply step 
2. That is, apply a random cut to the second and third cards: 


0, (J0,J0,) J0, 

-> (00000 or (ii) 0000 


Next, applying step 3, reveal the second card. Assume that the face-up second 
card is [#| (as in step 3(a)), i.e., 

(0 0 0^0, or (iO^^0- (4) 

Then, it means that either (i) a = 0 or (ii) 6 = 0 (because [♦] = 0). If (i) a = 0, 
then a = 1 and hence a A b = b; if (ii) 6 = 0, then a A 6 = 0 = 6. Therefore, 
in either case, we have a A 6 = 6, and hence one can notice that the value of 
a A 6 = 6 can be obtained by revealing the fourth card 

10 

6 

in the sequence ®). Actually, in step 3(a), the fourth card is opened. If it is 0 
then 6 = [^] = 0 and hence a A 6 = 6 = 1; if it is then 6 = [^] = 1 and hence 
a A 6 = 6 = 0. 

Thus, steps 2 and 3(a) surely compute the value of a A 6. One can similarly 
verify the claim for the case of step 3(b). Therefore, steps 2 and 3 can provide at 
least the value of a A 6. However, they also leak some secret information about 
a and 6; indeed, for example, when the second card revealed in step 3 was 0 
we have a = 0 or 6 = 0 (as seen above), and hence the fact of (a, 6) ^ (0, 1) 
has been disclosed. Therefore, since executing only steps 2 and 3 is not secure, 
our protocol applies a random bisection cut in step 1 to guarantee secrecy, as 
intuitively explained below. 
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Note first that adding step 1 never affects the computation outcome of exe- 
cuting steps 2 and 3: after applying a random bisection cut in step 1, we have 
either 


or 


(5) 


and then applying steps 2 and 3 to the sequence Q always provides the value 
of a A b as shown above (because a A b = b A a). 

To see that the secrecy is preserved by introducing a random bisection cut in 
step 1, we enumerate all possibilities: 


step 2 


0 0 0 0 


19 . 11119 . 13 . 

a a b b 


3333 -* 33 33 

a a b b a b a b 

J3.J9.J9.J9. 

b b o, a 

J9.J9.J9.J3. 

fefeoo ftofta 


Therefore, opening the second card (in the rightmost sequence) means that one 
of a, b , b and a is randomly revealed. Hence, opening the second card never leaks 
any information about each of a and b. 


3.2 Proof of Correctness 

In this subsection, we prove that our protocol works correctly. 

Recall that a random bisection cut [[?][?] || [?][?]] in step 1 and a random cut 
| ? | ( | ? [ ? D | ? | in step 2 are applied to the two commitments; one can enumerate, 
as in Tablets all possibilities of the four cards after each of steps 1 and 2. Note 
that the cases (a, b) = (0,1) and (a, b) = (1,0) both fall in the same status 
category after step 1 (and after step 2, of course). 

Consider the actual execution of our protocol based on Table 01 After step 
3, all possibilities can be enumerated as shown in Table 01 (Remember that the 
second card is opened in step 3, and that if it is [^]. then the fourth card is 
opened; otherwise, the first one is opened.) Table 0] immediately implies that 
our protocol surely computes the value of a A b — if the two revealed cards are 
the same type, then a A b = 1; otherwise, a A b = 0. 

To verify the secrecy of the protocol, it suffices to show that 

Pr[(0, 0) | |?|*|?M ] = Pr[(0, 0) | |*M?I?| ] - Pr[(0, 0) | a A b = 0] , 
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Table 3. All possibilities of the four cards after each of steps 1 and 2 


(a,b) 

initial 

after step 1 

after step 2 

(0,0) 




(0,1) 




(1,0) 

M*I*M 

same as (0, 1) 

same as (0, 1) 

(1,1) 

M*M*I 

M*M*I 



Table 4. All possibilities of the four cards after step 3 


(°, b) 

initial 

after step 3 

(0,0) 



(0,1) 


|?|*|?|9|o,|*p|?|?| 

(1,0) 

M*I*M 

|?|*|?|9|or|*|9|?|?| 

(1,1) 

M*M*I 

|?|*|?|*|or|op|?|?| 


Pr[(0, 1) | |?|*|?M ] = Pr[(0, 1) | |*M?|?| ] - Pr[(0, 1) | a A b = 0], 

and 

Pr[(l, 0) 1 |?|*|?M ] = Pr[(l, 0) | |*M?|?| ] - Pr[(l, 0) | a A b = 0], 

Let Pr[a = 0] = p and Pr[6 = 0] = q. Then, we have Pr[(0, 0) | a A b = 0] = 
pq/[p + q- Pq )■ On the other hand, 

r r[( o,o)im*EE, - 

as desired. For all the remaining cases, one can easily check the equality. 

4 Conclusions 

In this paper, we presented a four-card secure AND protocol whose output is in 
a non-committed format. Since the existing protocol, namely the five-card trick, 
requires five cards, we have succeeded in reducing the number of required cards 
by one. Our protocol is optimal in the sense that at least four cards are required 
for commitments to the inputs a and b. 

Note that the OR function can also be securely computed by using four cards, 
say, according to de Morgan’s law aV b = a Ab. 

This paper showed a secure AND computation in a non-committed format 
using only four cards. For the committed format case, the best known protocol 
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0 requires six cards as seen in Table Q An intriguing open question is whether 
there exists a “committed format” AND protocol that requires fewer than six 
cards. 
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Abstract. We construct a provably secure mix-net from any CCA2 secure cryp- 
tosystem. The mix-net is secure against active adversaries that statically corrupt 
less than A out of k mix-servers, where A is a threshold parameter, and it is robust 
provided that at most min(A — 1, k — A) mix-servers are corrupted. 

The main component of our construction is a mix-net that outputs the cor- 
rect result if all mix-servers behaved honestly, and aborts with probability 1 — 
otherwise (without disclosing anything about the inputs), where t 
is an auxiliary security parameter and H is the number of honest parties. The 
running time of this protocol for long messages is roughly 3£c, where c is the 
running time of Chaum’s mix-net (1981). 


1 Introduction 

A mix-net, introduced by Chaum in 198 1 Q, is a tool to provide anonymity for a group 
of senders. The main application is electronic voting, in which each sender submits an 
encrypted vote and the mix-net then outputs the votes in sorted order. Mix-nets have also 
found applications in other areas, e.g., anonymous web browsing 0, payment systems 
ira and even as a building-block for secure multiparty computation fin . 

A mix-net is constructed as a cryptographic protocol by invoking a set of mix-servers, 
arranged in a series. The original mix-net proposed by Chaum works as follows. To 
set up, each mix-server publishes a public key for an encryption system. Each sender 
then publishes a “wrapped” message with several layers of encryption: starting with 
the innermost layer — an encryption of her plaintext message using the last mix-server’s 
public key — and ending with the outermost layer, encrypted using the first mix-server’s 
public key. Once all senders have published their encrypted inputs, the mixing stage 
begins. In turn, each mix-server receives the encrypted values output from the previous 
server, “peels off” a layer of encryption, i.e., decrypts the values using his private key, 
sorts the decrypted values and passes them on to the next mix-server in the chain. The 
output of the final mix-server is the sorted list of the senders’ original inputs. 

Chaum’s mix-net hides the correspondence between the input ciphertexts and the 
output plaintexts, but even a single mix- server can undetectably modify the output or 
refuse to take part in the protocol (forcing the protocol to abort without output). These 
drawbacks have been addressed in previous work. The most widely researched line of 
work is based on the idea of re-encryption mixes (originally proposed by Park, Itoh and 
Kurosawa Ell) ; these rely on homomorphic encryption schemes whose ciphertexts can 
be “re-randomized”. Using the homomorphic properties of the encryption scheme, it 
is possible to generate very efficient zero-knowledge proofs that the mixing was per- 
formed correctly (e.g., Neff m or Furukawa and Sako 0 ). While the state-of-the-art 
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re-encryption mixes are both provably secure and efficient for short inputs, their reliance 
on homomorphic properties limits them to a few specific encryption schemes. 

1.1 Our Contribution 

In this paper, we propose a new, efficient mix-net protocol that satisfies several highly - 
desirable properties: 

- Minimal Cryptographic Assumptions. Our protocol can be based on any CCA2- 
secure cryptosystem, without requiring additional assumptions. In particular, we 
do not require the underlying encryption to have homomorphic properties. 

While interesting from a theoretical standpoint, this also has clear advantages 
in practice, as it gives greater flexibility in the choice of encryption scheme. For 
example, all currently practical homomorphic encryption schemes are susceptible 
to attacks from quantum computers. Although we do not currently know how to 
build quantum computers, it is important to take this vulnerability into account 
when using a mix-net as part of an electronic election scheme: ballot privacy is 
often required to be preserved for decades — these timeframes may be long enough 
for the development of a working quantum computer. 

Furthermore, the flexibility in the choice of encryption scheme makes it easy 
to deal efficiently with long inputs, while there do exist mix-nets that can deal 
with long inputs efficiently H1 1117151 . these mixes require even more specialized 
encryption schemes tailored specifically to that purpose. 

- Provable Security. Many of the existing mixing protocols do not have formal proofs 
of security. This may seem like a purely theoretical concern, but the history of cryp- 
tographic protocols, and mix-nets in particular, shows that there is good reason to 
distrust heuristic approaches. A notable example of this is the Randomized Par- 
tial Checking (RPC) scheme of Jakobsson, Juels and Rivest (Hi (our main “com- 
petitor” in the field of generic CCA2-based mixes). The RPC scheme (and related 
constructions) have been around for over a decade, and have already been used in 
binding elections; however, recent work by Khazaei and Wikstrom m shows that 
RPC contains a subtle but serious security flaw, which was consistently missed in 
implementations. Other examples abound (see Section IT~T| for more). 

In contrast, our protocol is proven secure in the Universal Composability frame- 
work m, a very strong notion of security that holds even when arbitrary additional 
protocols are run concurrently. (If a cryptosystem which allows recovering the ran- 
domness from a ciphertext using the secret key is used to implement the (non zero- 
knowledge) proof of correct decryption, then the result only holds in the stand-alone 
setting.) 

- Full Security. The RPC scheme gains efficiency by relaxing slightly the security 
requirements. It prevents corrupt mix- servers from undetectably modifying many 
inputs of honest senders, but a malicious server can succeed in changing a con- 
stant number of inputs with non-negligible probability. For some uses, this may 
not be acceptable. RPC also relaxes the privacy guarantees: while the exact cor- 
respondence between senders and their inputs is hidden, some information may 
still be leaked. Our protocol, with comparable or better efficiency, provides full 
simulation-based security. 
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Our protocol is based on a new technique we call Trip-Wire Tracing (TWT). Our main 
idea is to do away with zero-knowledge proofs (that would be costly for a generic 
cryptosystem) used by existing protocols to guarantee correctness and replace them 
with a virtual “trip wire” system: we insert “fake” inputs into the mix to act as trip 
wires for catching misbehaving mix-servers (for more details, see Sectional). 

Security Guarantees and Assumptions. The protocol preserves privacy and correctness 
against active adversaries that statically corrupt less than A mix-servers, where A is a 
threshold parameter, and it is robust provided that at most min(A — 1, k — A) mix- 
servers are corrupted, where k is the number of mix-servers. As for all other mix-nets 
in the literature, we assume the existence of an ideal bulletin board functionality (this 
is equivalent to a broadcast channel). We also need an ideal functionality for shared 
key generation. In the general case (when we can only assume a generic CCA2-secure 
cryptosystem without any additional structure), this functionality would have to be im- 
plemented using general MPC. However, if the chosen cryptosystem does have a more 
efficient shared-key-generation protocol, it can be used instead (in any case, the bulk of 
the work can always be carried out offline, in a preliminary key generation phase). 

Finally, we need a functionality for proving that a ciphertext is correctly decrypted, 
but it suffices that this protocol hides the secret key. This functionality can be realized 
trivially if the cryptosystem allows recovery of the randomness (used to form the cipher- 
text) using the secret key. In any case this protocol is only used to identify corrupted 
parties and mix-servers, so during normal operation it is not used at all. 

Limitations of Our Protocol. Our construction essentially uses privacy to ensure cor- 
rectness (by hiding the “trip-wires” from malicious mix servers). Because a threshold 
coalition of malicious servers can always violate privacy, our protocol loses correctness 
as well in this case. This implies that our protocol cannot be “universally verifiable” 
(i.e., verifiable by third parties who do not trust any of the mix servers). In compar- 
ison, the state-of-the-art mix-nets based on homomorphic cryptosystems can provide 
integrity (but not privacy) even if all mix-servers are corrupt. 

We remark that RPC only allows a restricted form of universal verifiability, i.e., 
its relaxed correctness degrades further and allows an adversary that controls all mix- 
servers to undetectably replace a notable number of ciphertexts. 

1.2 Related Work 

The literature on mix-nets and verifiable shuffling is extensive. Below, we mention a 
small sample of particularly relevant works. Park, Itoh and Kurosawa 111 911 introduced 
re-encryption mixes as a way to improve efficiency — the size of the ciphertexts and the 
amount of work performed by senders does not depend on the number of mix- servers. 
Sako and Kilian constructed the first universally-verifiable mix-net o, where senders 
can verify that the entire shuffle was performed correctly (and not just that their own 
input was included in the output). Sako and Kilian’s construction was based on cut-and- 
choose zero-knowledge proofs; Neff m and Furukawa and Sako 0 gave much more 
efficient zero-knowledge proofs of shuffle for homomorphic cryptosystems. Many of 
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the works in the field aim to improve the efficiency of the mix-net. Our construction in- 
cludes ideas that appear in several previous papers: Jakobsson used the idea of “dummy 
inputs” Q and “repetition” 0 to increase correctness (although in a different way 
than we do). Golle, Zong, Boneh, Jakobsson and Juels m considered mix-nets that are 
“optimistic” (i.e., can be much more efficient in the case that no errors occur). 

On the Importance of Formal Proofs. A recurring tale in the history of mix-net de- 
sign is the proposal of a mix-net construction followed by discovery of security flaws. 
Following Chaum’s seminal paper 0, Pfitzmann and Pfitzmann pointed out that Chau- 
mian mixes are vulnerable to attack if the encryption scheme used is malleable Ell. 
The mix-net of Park et al. ITH1 was also shown to be vulnerable to similar attacks J2Qj. 
Jakobsson’s scheme of 0 was broken in 0. His other scheme 0, was broken by 
Mitomo and Kurosawa ra, who also suggested a fix; this in turn, in addition to the 
schemes of Jakobsson and Juels ins, of Golle, Zong, Boneh, Jakobsson and Juels □ 
were all shown to be vulnerable (to various attacks) by Wikstrom m. 

While a formal proof of security is not an iron-clad guarantee that no vulnerabilities 
will ever be found (proofs may have subtle errors, and assumptions may be shown to 
be wrong), they do significantly improve the trust in the security of a cryptographic 
scheme. In fact, the need for some of the components of our protocol only became 
evident during the analysis of the protocol. 

2 Informal Description of Our Protocol 

We begin with an overview of our mix-net protocol and some intuition for why this 
protocol is secure. The main component of our construction is a mix-net that outputs 
the correct result if all mix-servers behave honestly, and aborts with overwhelming 
probability otherwise — without disclosing anything about the inputs. At a high level, 
our mix-net with abort protocol is a Chaumian mix-net with added verification. It is 
parametrized with an auxiliary security parameter t and uses two Chaumian mix-nets 
in sequence (one with “explicit verification” and one with “partial tracing”) and three 
additional layers of encryption (labeled as “final”, “repetition” and “outer”). Figure Q] 
presents a schematic of our protocol. 

Each sender encodes her message as a bundle of t ciphertexts: First, she encrypts 
her plaintext message using the public key of the “final” layer of encryption and makes 
t identical copies of it. Next, each copy is further encrypted using the public key of 
the “repetition” encryption layer and then under the public keys of the mix-servers in 
the two Chaumian mix-nets. Finally, the t encryptions are concatenated and encrypted 
using the public key of the “outer” encryption layer. To generate the final list of inputs 
to the mix-net, each mix-server adds a “dummy” encryption of zero to the list of inputs 
submitted by the senders (the dummy input is constructed using the same operations as 
the real inputs). 

Once all parties have submitted their bundles, the decryptions proceed in the reverse 
order. If all the parties are honest, there will be t identical copies of each innermost ci- 
phertext before the final decryption takes place. In this case the dummies are traced and 
removed, the duplicates are ignored, and only one instance of each sender’s innermost 
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decrypt Chaum’s mix-net with Chaum’s mix-net decrypt decrypt 

(outer) SP 1 explicit verification with partial tracing (repetition) SOTt (final) 



Fig. 1. Execution of Protocol 0 with N = 3 senders, k = 3 mix-servers, and t = 2 repetitions, 
where all parties are honest. Each party submits a bundle of two ciphertexts containing identical 
innermost ciphertexts. The bundle is decrypted and split into two ciphertexts. All ciphertexts 
are then individually shuffled in the two instances of Chaum’s mix-net. Then the first is verified 
explicitly (revealing the permutation), the dummy ciphertexts are traced in the second (revealing 
the paths of the dummies) and the output is decrypted and verified to contain t copies of each 
ciphertext. If all tests passed, then a final round of decryption recovers the plaintexts. 


ciphertext is decrypted. We stress that this is only an outline of the protocol. Additional 
measures are taken for ensuring correctness and privacy. 

To help give the intuition for our construction, we will describe a sequence of attacks 
on the Chaumian mix-net and our corresponding modifications to the protocol that pre- 
vent them. The final protocol is a composition of all these modifications. We start with 
a “core” Chaumian mix, which ends up — after slight modifications — as the box labeled 
“Chaum’s mix-net with partial tracing” in Figure0 We call a set of ciphertexts contain- 
ing identical innermost ciphertexts a copyset. 

1 . Elementary Error Handling. The first type of attack we consider is the introduc- 
tion of “simple” errors that are publicly detectable. Invalid ciphertexts are simply 
ignored. If there are duplicates of a ciphertext in the input to a mix-server, then 
exactly one copy is considered part of the input and the rest is ignored. 

2. Replication. In a Chaumian mix-net, any corrupt mix-server can change the output 
undetectably by replacing an output ciphertext with a new one generated by the 
malicious server (this new ciphertext can be completely valid, except for not being 
a decryption of any input ciphertext). To prevent this attack, each sender submits t 
independently formed ciphertexts of her message to the Chaumian mix-net. 

To see why this replication technique helps prevent replacement attacks, con- 
sider a corrupt mix-server that appears between two honest mix-servers in the mix- 
net chain. In this case, the corrupt mix-server cannot identify which of the cipher- 
texts encrypts the same messages due to the following two reasons. 

(a) He does not know the secret key of the succeeding honest mix-sever, and hence 
he can not fully decrypt the received ciphertexts and distinguish the copysets 
based on the final decrypted values. 

(b) The preceeding honest mix-server randomly permuted all of the ciphertexts 
and hence he does not know which ciphertext originated from which sender. 
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We prove that, if a CCA2 secure cryptosystem is used, t is sufficiently large, and 
all messages are randomly chosen, then no efficient adversary between two honest 
mix-servers can replace a proper subset of senders messages without detection. 

3. Replication Cryptosystem. In a Chaumian mix-net, the last mix-server learns the 
final output before anyone else. Thus, even with the replication trick, the final mix- 
server can clearly cheat, since he can identify all copies of a plaintext. To prevent 
this attack we modify the protocol by adding an additional “repetition” layer of 
encryption, using a public key for which the secret key is shared between the mix- 
servers. 

Think of this as running the Chaumian mix-net on encrypted inputs rather than 
plaintexts, i.e., each sender makes t encryptions of her input with the shared public 
key of the “repetition” layer, and then uses the encrypted values as her “plaintexts”. 
The output of the Chaumian mix-net is a list of ciphertexts encrypted with the 
shared public key, which prevents the last mix-server from identifying identical 
plaintexts and replacing all copies of a subset of the plaintexts. At the end of the 
mixing, the shared secret key is recovered and decryption is performed publicly. In 
Figure [T] this decryption step is the box labeled “decrypt (repetition)” right after 
the Chaum’s mix-net with partial tracing. 

4. Additional Mix-net with Explicit Verification. The first mix-server knows how to 
partition the input messages into copysets (since he receives the messages directly 
from the senders), hence he can replace all copies of a given plaintext undetectably. 
To prevent this attack, we add a new, unmodified Chaumian mix-net (the box la- 
beled “Chaum’s mix-net with explicit verification” in FigureQJ between the senders 
and the first mix-server in the “main” mix-net. Recall that the Chaumian mix-net 
does not give any correctness guarantees, but it does guarantee privacy if even a 
single mix-server is honest. This is exactly what we need to put the first mix-server 
in the Chaumian mix-net with partial tracing on an equal footing with the others in 
the chain. 

We rely on the privacy of the first Chaumian mix-net only to obtain correctness 
via replication. Therefore, once the second Chaumian mix-net finishes his process 
of mixing, the mix-servers can reveal the secret keys for the first Chaumian mix- 
net and verify its correctness completely (hence the name “mix-net with explicit 
verification”). If the verification fails, the guilty mix-server is publicly evident. 

5. Dummy Values. If a corrupt mix-server in the second Chaumian mix-net wishes to 
replace a proper subset of senders’ messages, he must guess the positions of the 
copysets, but he can still undetectably replace all of the inputs with his own values. 
To prevent this, we have every mix-server add a “dummy” value to the inputs of the 
mix-net. These dummy values are treated identically to the senders’ inputs. Thus, 
any mix-server attempting to replace the entire list of inputs would also be replac- 
ing all dummy values. The mix-servers can “trace forward” the dummy values and 
remove them from the final decrypted list if the trace completes successfully. There 
is no privacy requirement for the dummy values; therefore, each mix-server can 
simply reveal all the randomness used in the encryption of the initial dummy val- 
ues. This reveals all the internal layers of encryption in a verifiable way, allowing 
everyone to find the corresponding ciphertexts in each stage of the mix-net. 
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6. Replication Verification and Error Tracing. We need to handle the case where some 
of the final values, after recovering the shared secret key of the “repetition” encryp- 
tion layer, do not have exactly t copies. 

We now add another step to the protocol after decryption using the shared secret: 
replication verification (this occurs in the part labeled “sort” in Figured) and error 
tracing (this is not shown in Figure Q] since we assume all parties are honest). In the 
replication verification step, the (honest) mix-servers verify that there are exactly t 
duplicates of every output value. This clearly is the case if all servers and senders 
are honest. If the verification fails, however, we need to figure out who is to blame 
so that we can continue the protocol if it was just a corrupt sender. To do this, we 
need to trace errors through the system in two ways: 

(a) Backwards Tracing. After determining the messages with more or less than t 
duplicates, we trace them backwards to identify their original senders. Since 
each mix-server knows his own permutation, the backwards trace is easy to 
do: each mix-server in turn (starting from the last one and going backwards) 
publishes the “paths” taken by the traced messages along with a proof that the 
decryption was performed correctly. If a broken copyset being traced contains 
ciphertexts that were introduced by a cheating mix-server (i.e., ciphertexts that 
are not valid decryptions of the mix-server’s inputs), the mix-server will not be 
able to provide a valid trace and will be identified as a cheater at this point. 

(b) Forward Tracing. If all the broken copysets were successfully traced back to 
their sender, there are still two remaining possibilities for casting blame: 

i. The mix-servers behaved honestly, and bad copysets were submitted by 
corrupt senders. 

ii. At least one ciphertext submitted by an honest party was replaced by a 
corrupted mix-server. (This could be the case even if no cheating was dis- 
covered during backwards tracing. To see this, consider the case that a 
corrupt mix-server arbitrarily chooses t ciphertexts from honest senders 
and replaces them with a valid copyset.) 

To distinguish these two cases, we identify the senders from which the broken 
copysets originated, and “trace forward” all the messages of these senders. This 
is done similarly to the backwards tracing, but in reverse: starting from the first 
mix-server and going forwards, each one in turn publishes the paths taken by 
the traced messages along with a proof of correct decryption. If a mix-server 
cheated, he will not be able to provide a valid trace — hence he will be fingered 
as the culprit. On the other hand, if only the identified senders were cheating 
(e.g., by not encrypting a valid copyset in the first place), we will be able to 
trace the messages all the way to the output. 

If the backwards and forward tracings complete successfully without identifying 
a mix- server as culprit, the ciphertexts of the corrupt senders are removed from 
the output (otherwise, the protocol outputs the identity of a guilty mix-server and 
aborts). 

7. Final Cryptosystem. As we have described in Step [^1 to catch a misbehaving mix- 
server we must sometimes trace messages of honest users through the system. Al- 
though we abort the protocol in this case, we must still preserve the honest senders’ 
privacy. Therefore, we protect the senders’ messages with an additional layer of 
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encryption (the last box labeled “decrypt (final)”)- That is, a sender first encrypts 
her message under the “final” public key and uses this encrypted message as an 
input to the protocol as described so far. This innermost encryption layer is jointly 
decrypted only if the protocol does not abort. If the protocol does abort, only the en- 
crypted values are revealed and privacy is protected by the final layer of encryption. 
The “final” layer of encryption also guarantees that the “plaintexts” of the protocol 
we have sketched so far (without the “final” layer) are distinct for all honest senders 
(and different from corrupt senders) with overwhelming probability. 

8. Outer Cryptosystem. The protocol is still vulnerable to a subtle attack that uses the 
error-tracing mechanism itself to violate sender privacy. The problem is that tracing 
occurs in two additional indistinguishable cases: 

(a) Corrupt senders collude to create “colliding” ciphertexts (i.e., after removing 
some layers of encryption, the resulting ciphertexts are identical). 

(b) Corrupt mix-server(s) collude with corrupt sender(s) to copy some of an honest 
sender’s ciphertexts. 

In both cases tracing will complete successfully (since no inputs were replaced in 
the middle of the mix-net). Because in the first case the mix-servers are all honest, 
we cannot simply abort if this situation occurs. On the other hand, in the second 
case, we may be forced to trace an honest ciphertext from beginning to end (we 
trace a broken ciphertext back to a corrupt sender, then trace forward all of that 
sender’s inputs, which include a copy of an honest ciphertext). Since the corrupt 
sender knows the identity of the sender from whom the ciphertext was copied, if 
we decrypt that value the honest sender’s privacy is violated. 

To prevent this, we add an “outer” layer of encryption (the box labeled “decrypt 
(outer)”): under a public-key whose secret key is shared by all the mix-servers, 
each sender formes a single “bundled” ciphertext. After all the ciphertext bundles 
are received, the mix-servers recover the secret key of the outer cryptosystem and 
the bundles are publicly decrypted and “split” into the separate copyset ciphertexts. 
This countermeasure works due to the CCA2 security of the cryptosystem: CCA2 
security ensures that no corrupt coalition of mix-servers and senders can make par- 
tial copies of an honest sender’s copyset: either they copy a bundle in its entirety 
(in which case they are removed due to being duplicates) or they create a bundle 
that is completely independent of the honest senders’ bundles (in which case the 
probability of a collision is negligible). 

3 Notation 

For an integer e, we denote the set {1, . . . , e} by [1, e]. The security parameter, n, 
represented in unary, is an implicit input to all protocols and functionalities. Whenever 
we say a quantity £ is negligible, we mean that it is negligible in the security parameter, 
i.e., for every c > 0 we have e(n) < n~ c for all but finitely many n. We write x € a 
for a list a = (oi, . . . , a e ) if and only if x € {ai, . . . , a e }. The length of a is denoted 
by |a|. For any index set I C [1, e] of size £, we write (c/,;) ie / = (a*, , . . . , a^), where 
I = with ii <%<•••< in. We say that a list b = (bi , . . . , be) is a subset 

of a and write b C a, if and only if {6i , ... .hr} c {ai, . . . , a e } (with multiplicity). 
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We use Sort(a) to denote the lexicographically sorted list of elements from a (with 
multiplicity). We write a \ b for Sort({ai, . . . , a e } \ {bi , . . . , be}) (with multiplicity in 
the set difference). We also write aob for the concatenated list (ai, . . . , a e , &i, . . . , be). 
We denote by Unique(a) the sorted list where each element of a appears exactly once. 

We denote a cryptosystem by CS = (Gen, Enc, Dec), where Gen, Enc, and Dec 
denote the key generation algorithm, the encryption algorithm, and the decryption al- 
gorithm respectively. To deal with nested encryption as needed in a Chaumian mix-net, 
we simply assume that a plaintext of any length can be encrypted, but that indistin- 
guishability only holds for plaintexts of the same length. We write c = Enc p k(m, r) for 
the encryption of a plaintext m using randomness r, and Dec s fc (c) = m for the decryp- 
tion of a ciphertext c. We often view Enc as a probabilistic algorithm and drop r from 
our notation. We assume that malformed ciphertexts are decrypted to a special symbol 
different from all normal plaintexts. 

We extend our notation to lists of plaintexts, ciphertexts and keys as follows. For 
a plaintext m = (mi, . . . , m e ) and a key pair ( pk , sk) with pk = (pk x , . . . , pk} 
and sk = (ski, ■ ■ ■ , ske) we write c = Enc p k(m), where c = (ci, . . . , c e ) with 
Cj = EnCpfe 1 (Enc P s, 2 (- ■ • Enc p ^(m,) ■ ■ ■ )). Similarly, m = Dec s fc(c) is defined by 
mi = Dec s fc < (Dec s fc € _ 1 (- • • DeCs^Cj) ■■■))• We stress that when we use Enc as a 
probabilistic algorithm with a list of messages or public keys, we assume that the ran- 
dom values used in each encryption are chosen randomly and independently. We use the 
notation a||6 for the concatenation of two bitstrings. We define the function Split t (a) to 
divide a bitstring a, whose length is a multiple of t, into t chunks of equals lengths and 
turn it into a list, i.e., (cii, . . . , a} = Split t (ai|| . . . ||a t ) when |aj|s are equal. 

4 Definitions and Conventions 

We consider a mix-net employing k mix-servers M. 1 , . . . ,M k that provide anonymity 
for a group of N senders V\, , Vn- Throughout, M. and V denote the sets of all 
mix-servers and senders respectively. We let Jm C [1, A] and I-p C [1, N] denote the 
index sets of corrupted mix-servers and senders respectively. We let J* C Jm denote 
the index set of mix-servers identified as corrupted so far. This set may grow throughout 
an execution. 

We present and analyze the main components of our mix-net in the universal com- 
posability framework m, with non-blocking adversaries, i.e., adversaries that do not 
block the delivery of messages indefinitely. We use superscripts to distinguish different 
functionalities and protocols, for example T hh for a bulletin board and 7r c for Chaum’s 
mix-net. The ideal adversary (simulator) of the ideal model is denoted by S. When there 
is no ambiguity, we use the same notation for dummy parties and real parties. 

We use a number of conventions to simplify the exposition. Whenever we say a party 
“hands” a message to a functionality, we mean that the party sends the message to the 
corresponding dummy party who will then forward it to the functionality. All our func- 
tionalities capture distributed protocols where messages sent to more than one party 
can be delayed arbitrarily by the adversary, and all such messages are also given to the 
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adversary. Thus, when we say that a functionality hands a message to more than one 
party, we mean that the message is passed to the adversary, who then schedules the de- 
livery of the message to the parties. When a party “inputs” a message to a subprotocol, 
we mean that he executes the algorithm of the corresponding party with the same mes- 
sage. A party or protocol is said to “wait for” an input of a given form if any other input 
is immediately returned to the sender. Similarly, a party can wait for a message to ap- 
pear on the bulletin board. In practice this would be implemented using a time-out, after 
which some default value is taken to be the message. Some of our functionalities give 
an output before receiving any input, which makes no sense in an event-driven model 
like the universal composability framework where execution starts by activating the en- 
vironment. This is merely a useful convention, since we can easily fix this problem by 
allowing parties to request the given data. 

In all of our protocols, security holds only as long as the adversary corrupts less than 
A mix-servers, where 1 < A < k is a parameter of the protocol. All our functionalities 
and protocols may fail to give an output if more than min(A — 1, k — A) mix-servers are 
corrupted. To capture the case A > k/2 with minimal notational overhead, we simply 
assume that even a non-blocking simulator can block messages indefinitely in this case. 

We use the subroutine Agree(Tag), parameterized by a label Tag, to simplify the 
description of some of our ideal functionalities. The subroutine waits until each mix- 
server A 4j has submitted a pair ( Tag,rrij ) for some message nij . If at least A mix- 
servers submitted identical mj, then the subroutine returns this value and otherwise it 
halts the complete ideal functionality, e.g., the functionality could hand _L to all parties 
and ignore inputs from then on. The message rrij can be an empty string in which case 
the subroutine is only used to capture the robustness property of the functionality. In 
AnnendixIXlwe give a formal definition of the subroutine Agree(Tag). We use the same 
convention for protocols, i.e., if an ideal functionality used by the protocol aborts, then 
the protocol aborts as well. These conventions allow us to capture the robustness of a 
protocol by requiring a non-blocking simulator for a non-blocking adversary. 

4.1 Useful Functionalities 

Our results are given in a hybrid model with distributed key generation functionalities of 
two types, a bulletin board functionality, and a proof of correct decryption functionality. 
In Appendix [0 formal descriptions of these functionalities are presented. The first key 
generation functionality, generates a public key pkj such that only the jth mix- 
server knows the corresponding secret key skj. The second functionality, J ?dkg , differs 
only in that no mix-server learns the secret key sk corresponding to the generated public 
key pk. In both functionalities, any subset of A mix-servers can recover the secret key. 
The bulletin board functionality, denoted by F hb , is used by parties to announce their 
messages. That is, a message can be posted by any party and read by any other one. To 
simplify the exposition, we simply say that a message is “published” when it appears on 
the public bulletin board. The published message can not be deleted or modified once 
posted. The proof of correct decryption functionality, Fj d , is used to prove that the jth 
mix-server has correctly decrypted a known ciphertext into a known plaintext. A subset 
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of A mix-servers must agree on the pair of plaintext and ciphertext. Assuming that 
the underlying cryptosystem allows the jth mix-server to recover the randomness used 
during encryption from the ciphertext itself, the realization of the functionality becomes 
trivial. In other words, the proof of correct decryption simply consists of revealing the 
randomness used for encryption. Our main result, Theorem QJ holds if this solution is 
employed, but only in a standalone model (see the full version of this paper for details). 
In any case, proofs of correct decryption are only used to trace ciphertexts to identify 
corrupted senders or mix-servers. 

4.2 Mix-Nets 

We use ideal mix-net functionalities similar to that in El, but in a slightly simplified 
form in that we assume that each sender submits exactly one input. Functionality Q] 
presents a natural mix-net. Our results are easy to generalize to the case where senders 
can submit more than one input (this holds also for Functionality0and Functionality 0. 

The protocol we construct does not quite implement the natural mix-net. Thus, we 
present a relaxed mix-net (Functionality 01 which we are able to securely realize and 
then argue that it still provides sufficient guarantees. The relaxed functionality first 
hands the adversary (simulator) a public key. Then it waits for inputs from all the 
senders, encrypts the messages of the honest senders, and then hands the resulting ci- 
phertexts in sorted order to the adversary. The adversary is then asked to provide his 
own inputs in encrypted form on behalf of corrupted senders. The final output is the 
sorted decryption of the union of the ciphertexts computed by the functionality and 
those provided by the adversary (after duplicates are removed). For technical reasons 
the functionality uses several public keys and encrypts the messages under all keys. 

This functionality provides unconditional privacy for honest senders. The relaxation 
lies in the ability of an unbounded adversary to adaptively choose the messages of the 
corrupted senders based on the set of inputs of the honest senders, but a CCA2-secure 
cryptosystem prevents this for efficient adversaries. 

We define a mix-net with abort (Functionality 0) that either gives a proper output or 
aborts after identifying a mix-server as culprit (with no information about the submitted 
messages at all). A relaxed mix-net can be constructed using such a mix-net with abort. 
The mix-net with abort waits for inputs from all the senders and then outputs these 
messages in encrypted form (as in the relaxed mix-net). Then it allows the mix-servers 
to agree on a list of known corrupted mix-servers. Finally, the adversary decides if the 
mix-net should abort or not. In the former case, the adversary must provide the index of 
a previously unknown corrupted mix-server, and this is forwarded to all mix-servers. In 
the latter case, the mix-net outputs the result like in the relaxed mix-net. 

In the full version of this paper we describe a protocol using Functionality 0 that 
securely realizes Functionality 0 The idea is to use A instances of Functionality 0 
Each sender submits a copy of his input to all functionalities. The mix-servers then run 
them sequentially until one produces an output without aborting. To ensure that this 
scheme eventually gives an output, the mix-servers jointly keep track of the identified 
corrupted mix-servers. 
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Functionality 1 (Natural Mix-Net). The natural mix-net functionality J rmn exe- 
cuting with dummy senders V, dummy mix-servers M. and ideal adversary S pro- 
ceeds as follows. 

1. Let I = [1, N], While 1^0: 

a) Wait for a message (Message, nii) with m,; G {0, l} n from some dummy 
sender V t with i £ I. 

b) Set I <— I \{i} and hand ( MessageReceived , i) to S. 

2. Hand (Mixed, Sort(mi, . . . , mjv)) to S and M. 


Functionality 2 (Relaxed Mix-Net). The relaxed mix-net functionality T mu exe- 
cuting with dummy senders V, dummy mix-servers M. and ideal adversary S pro- 
ceeds as follows. 

1. Hand ( PublicKeys , (p/c^)^ =:1 ) to S, where (pk t , she) = Gen(l"). 

2. Let I = [1, N], While 7^0: 

a) Wait for a message (Message, rrii) with m,; G {0, l} n from some dummy 
sender V t with i G I. 

b) Set I <— I \ { i} and hand (MessageReceived, i) to S. 

3. Let Li = Sort (( Enc p ^ (rrii )) N ^ Iv )- Hand ( HonestCiphe rtexts ,(Lf)^ =1 ) 

to S and wait to get back (CorruptCiphertexts, L' ,(*), where \IJ\ < \I-p\ and 
1 < l* < A. 

4. Hand (SecretKey, she*) to S and [Mixed, Sort(Dec s ^„ (Unique(L^» o L')))) to 
M. 


Functionality 3 (Mix-Net With Abort). The mix-net with abort functionality 
jrmna. executing with dummy senders V, dummy mix-servers A4, and ideal adver- 
sary S proceeds as follows. 

1. Generate (pk, sk) = Gen(l") and hand (PublicKey, pk) to S. 

2. Let / = [1, N], Then while 1^0: 

a) Wait for a message (Message, rrii) with m* G {0, 1}" from some dummy 
sender V, with i G I. 

b) Set I <— I \ {«}, let Vi = Enc p fc(m,;), and hand (MessageReceived, i) to S. 

3. Wait for a common input J* C Jm from dummy mix-servers, i.e., J* t— 
Agree(Culprits). 

4. Let L = Sort((«j)j e [ l! jv]\i T> ) and wait for a message EncryptPlaintexts 
from S. Then hand (HonestCiphertexts, L) to S and wait to receive 
(CorruptCiphertexts, L') where \L'\ < \I-p\, or (Culprit, d ) where d G Jm \ J*- 
In the latter case, hand (Culprit, d) to M. and halt. 

5. Hand (SecretKey, sk) to <S and [Mixed, Sort(Dec s fc(Unique(L o £')))) to Ad. 


5 Chaum’s Mix-Net 

Consider Chaum’s original mix-net J3 with A mix-servers in the chain. Each mix-server 
M.j generates a key pair [pkj, skj ) and a sender wraps her message m,; in A layers 
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of encryptions and submits a ciphertext Cj = Enc p fc 1 (Enc p fc 2 ( • • • Enc p fc A (m,;) •••)). 
Then the mix-servers form an initial list L 0 = and sequentially peel off 

layers of encryptions after removing the duplicates. That is, for j = 1, . . . , A, the jth 
mix-server computes Lj = Sort (Dec^. (Unique(Lj_i))). Thus, Unique(Lx) is the 
sorted list of plaintexts without duplicates. This mix-net is neither secure against active 
adversaries nor robust, but it nevertheless forms the basis of our constructions. We for- 
malize this in Protocol Q] below and later extend it in two different ways in Protocol 0 
and Protocol^] We assume that the main protocol (Protocol^! keeps track of the set J* 
of indices of identified corrupted mix-server so far, see StepEJof ProtocolQJbelow. 


Protocol 1 (Chaum’s Mix-Net, 7r c ). 

Mix-servers. The jth mix-server M.j proceeds as follows when executing with 
functionalities ■7 rbb , and J\ g , . . ., J 7 ^ 8 . 

1. Wait for ( PublicKey , pk e ) from T\ s for £ = 1 ..... A. Let pk = (pk lt . . . , pk x ) 

and output ( PublicKey , pk). Wait for (Secret Key. sky) from if j 6 [1, A], 

2. Wait for an input ( Culprits , J*). For £ = 1, .... A: if £ € J*, then hand Recover 
to and wait for a response ( SecretKey , ski). 

3. Wait for an input ( Ciphertexts , Lq). For £ = 1, . . . , A do the following and 
output (Mixed, Unique(L A )) : 

(a) If £ e J* or t = j , then set Li = Sort(Dec^(Unique(Z^_i))), and 
publish ( Decryption , Li). 

(b) Otherwise, wait until M. i publishes (Decryption, Li) (or we published Lf , 
since ski was recovered), where \ Lp\ = |Unique(Z^_i)|. 

Protocol Eland Pro toco 1 0 form a I i ze the two nested mix-nets used in our main proto- 
col. Recall from Section EJthat the first protocol is an optimistic execution of Chaum’s 
mix-net. The privacy of this mix-net is only required to temporarily randomize the input 
to the second mix-net. This is needed to argue that it is hard to replace all ciphertexts 
submitted by a non-empty proper subset of the honest senders without being identi- 
fied as a cheater. When Protocol El has completed, the optimistic execution is verified 
explicitly by simply recovering the secret keys of all mix-servers. 


Protocol 2 (Chaum’s Mix-Net with Explicit Verification, 7 r cev ). 

Mix-servers. The jth mix-server M.j when executing with functionalities T hh , and 
. . . , J 7 ^ 8 , first runs Chaum’s mix-net (Protocol!!]) and then proceeds as follows. 

4. Wait for an input Verify. Then for £ = 1 , . . . , A, where £ ^ ./* : 

(a) If £ = j, then publish ( SecretKey , skj). 

(b) If £ 7 ^ j and £ 0 J*, then wait until Me publishes ( SecretKey , ski), and 
halt with output (Culprit, £) if ski does not correspond to pk f or if Lp ^ 
Sort(Dec sfci (Unique(L^_i))). 

5. Halt with output ( SecretKey , sk), where sk = (ski, ■ ■ ■ , sk\). 


In our second variant of Chaum’s mix-net (Protocol 0>, the mix-servers proceed op- 
timistically, but in contrast to Protocol[3]they do not later verify the complete execution 
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explicitly. Instead, they trace a subset of ciphertexts backwards and forwards through 
the mix-net and reveal how they are decrypted in the process. In the main protocol a 
small subset of dummy ciphertexts (submitted by the mix-servers) are always traced 
forward to show that these were processed correctly. As explained in Sectional the idea 
is that starting from the randomly permuted output of Protocol 0 the adversary must 
avoid modifying the traced ciphertexts to avoid detection. In other words, to cheat with- 
out detection, a corrupted mix-server can not simply replace all ciphertexts. However, 
tracing starts by tracing any ciphertexts that do not have exactly t copies backwards 
to distinguish the case where a corrupted sender submits a malformed set of cipher- 
texts from the case where a corrupted mix-server processes his input incorrectly. Only 
then are the dummies, and possibly additional ciphertexts, traced forwards through the 
mix-net. 


Protocol 3 (Chaum’s Mix-Net with Partial Tracing, 7r cpt ). 

Mix-servers. The jth mix-server Mj, when executing with functionalities F bb , 
F x s , . . . , and Ff d , . . . , F£ d , runs Chaum’s mix-net (Protocol QJ, hands 
{SecretKey, skj ) to F? d if j £ [1, A], and then proceeds as follows. 

4. Backward Tracing. Wait for an input ( TraceB , B\), where B\ is the list of ci- 
phertexts to be traced backwards. For £ = A, . . . , 1 do the following and then 
output ( Traced , Bo): 

(a) Expand B p to a list B' f by adding the removed duplicates, i.e., the expanded 
list B'f, includes all copies in Lp of every ciphertext occurring in Bp. 

(b) If £ £ J* or l = j, then identify Bp-i c Lp-\ such that B[ = 

Dec s fc f and publish (TracedB, Bp-\ ). Otherwise, wait until Aip 

publishes ( TracedB , Bp- \ ) with Bp- \ C Lp-\ . 

(c) If £ £ J*, then hand (Verify. Bf B ( _ A ) to Ff* and halt with ( Culprit , £) if 
it returns False. 

5. Forward Tracing. Wait for an input ( TraceF , Fq), where Fo is the ciphertexts to 
be traced forward. For £ = 1, . . . , A do the following and then halt with output 
{Traced, F\): 

(a) LetF^ , _ 1 = Unique(F_i). 

(b) If £ £ .7* or £ = j, then let Fp = Dec s fc £ {F' t _ x ) and publish ( TracedF , Ff). 
Otherwise, wait until A4p publishes ( TracedF , Ff) with Fp C Lp. 

(c) If £ J*, then hand ( Verify , Fp, F' ( _ ;l ) to and halt with ( Culprit , £) if 

it returns False. 


Forward tracing of the dummy ciphertext list Fo, a subset of the input list Lq submit- 
ted by the mix-servers, is done in the natural way. For £ = 1, . . . , A, the £th mix-server 
computes Fp = Dec s k e (Unique(F^_i)) and proves that he did so correctly. The other 
mix-servers verify the proof and that Fp C Lp. 

Backward tracing of a list B\, a subset of the output list Unique(L;J, is more compli- 
cated in that we must invert the process of duplicate removal. For £ = A, . . . , 1, all mix- 
servers first expand Bp into a list B\. by including all copies in Lp of each ciphertext in 
Bp, and then the £th mix-server computes Bp- 1 c Lp- 1 such that B' e = Dec,^ {Bp- 1) 
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and proves that this relation holds. Thus, the expansion is the inversion of how Unique 
removed duplicates of traced ciphertexts during processing. 

The correctness of the decryption for the jth mix-server is verified using the proof 
of correct decryption functionality J r f d - Notice that for the dummies, it suffices that 
each mix-server simply reveals the randomness used to encrypt his own dummy inputs. 
However, for senders’ inputs this may not be possible for a general cryptosystem since 
the randomness is chosen by the corresponding sender and may not be known to the 
decrypting mix-server. Nevertheless, one possible incarnation of our protocol uses a 
cryptosystem that allows recovering the randomness used during encryption from the 
ciphertext itself during decryption. In this case, the proof of correct decryption used 
during tracing simply consists of revealing the randomness. 

6 Constructing a Mix-Net with Abort 

We are now ready to present the details of our mix-net with abort in Protocol^ We use 
two nested instances of Chaum’s mix-net: one with explicit verification (ProtocolEJ), and 
one with partial tracing (Protocol^. The lists of public keys of these mix-nets are denoted 
by pk cev and pk cpt , each of which contains A keys. Each sender encrypts her message m,; 
once using the additional joint “final” public key pk { to form a ciphertext v t . This layer of 
encryption hides the inputs of the honest senders if the execution aborts. The ciphertext 
Vi is then encrypted independently t times with the additional joint “replication” public 
key pk 1 . Recall that this prevents the last mix-server in Chaum’s mix-net with partial 
tracing (Protocol 0 from identifying all ciphertexts submitted by the same sender. The 
resulting ciphertexts are then encrypted using the lists pk cpt and pk cev of public keys 
of the two instances of Chaum’s mix-net. Finally, the t encryptions are concatenated to 
form one plaintext chunk and then encrypted using the “outer” public key pk°, which pre- 
vents a dishonest sender (with the collusion of some dishonest mix-servers) from partially 
copying an honest sender’s submission to break his privacy. In addition to the ciphertexts 
submitted by senders, each mix-server submits a dummy encryption of the zero message 
computed like a sender’s ciphertext. These ciphertexts prevent a corrupt mix-server from 
replacing all ciphertexts instead of guessing the positions of all ciphertexts submitted by 
a subset of the senders. 

To process the ciphertexts, the mix-servers first remove the “outer” layer of encryp- 
tion by jointly recovering the corresponding secret key sk°. Then they execute the two 
instances of Chaum’s mix-net in sequence. We stress that the t ciphertexts of each 
sender are processed independently at this stage. Then the secret keys in sk cev (cor- 
responding to pk cev ) are recovered and the mix-servers verify the execution of the first 
mix-net explicitly. The “replication” secret key sk T corresponding to pk 1 is then recov- 
ered and all ciphertexts are decrypted. Finally, the processing in the second mix-net 
is verified for: (1) all ciphertexts of which there are not exactly t copies (backward 
tracing), and (2) all dummy ciphertexts submitted by mix-servers and all ciphertexts 
intersecting with the ciphertexts traced backwards (forward tracing). If there is any in- 
consistency, the corrupted mix-server is identified and the execution aborts. If there is 
no inconsistency, then the “final” secret key sk { corresponding to pk { is recovered and 
the innermost layer of encryption is removed to reveal the plaintexts. 
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Theorem Q] captures security of Protocol If we use a cryptosystem that allows 
recovering the randomness used for encryption, then our result still holds, but only in 
the standalone model where the simulator is allowed to rewind. The full version details 
this variation of the scheme. 


Protocol 4 (Mix-Net with Abort 7r mna ). This protocol is executed with a bulletin 
board F hh , a mix-net with explicit verification 7T cev , a mix-net with partial tracing 
7r cpt , and distributed key generation functionalities and F^ lks . 

Senders. The ith sender V t proceeds as follows on input m, G {0, l} n . 

1 . Wait until A of the mix-servers have published identical list 
(PublicKeys, pk°,pk cev , pk cpt , pk T , pk { ). If no such list exists, then abort. 

2. Let Vi = Enc pfc f (mj). 

3. Letu i)S = Encp^v (Enc pfc cpt (Encpfcr («*))), for s = 1, . . . ,t. 

4. Let Ui = EnCpfc°(uj i i|| • • • ||uj jt ) and publish (Ciphertext, Ui). 

Mix-servers. The jth mix-server Mj proceeds as follows on input J*. 

1. Public Keys. Wait for public keys: ( PublicKey, pk° ) from kg , 
(PublicKey, pk cev ) from 7r cev , (PublicKey, pk cpt ) from 7r cpt , 
(PublicKey, pk T ) from F r clkf \ and (PublicKey, pk f ) from Fj Jks . Then publish 
(PublicKeys, pk° , pk cev , pk cpt , pk 1 , pk l ). Wait until A of the mix-servers have 
published the same list, or abort if no such list can be found. 

2. Input Ciphertexts. Wait until every V, has published her encrypted mes- 
sage (Ciphertext, Hi). Let UN+j be an encryption of zero as computed by a 
sender and publish (Ciphertext, UN+j). Wait until every A4f has published 
(Ciphertext, UN+e) and let L™ = Unique(tli , . . . , ujy+fc). 

3. Culprits Agreement. Publish (Culprits, Jj) and wait until A of the mix-servers 
have published identical (Culprits, J*), or abort if no such set J* can be found. 
Input (Culprits, J*) to 7r cev and 7r cpt . 

4. Decrypt and Split. Hand Recover to Fg kg and wait for a response 
(SecretKey, sk°). Let L° = OufE/. ln Split t (Dec s fc" (u j) . 

5. Chaum’s Mix-Net. Input (Ciphertexts, L°) to n cev and wait for an output 
(Mixed, I/ cev ). 

6. Chaum’s Mix-Net. Input (Ciphertexts , L cev ) to 7r cpt , and wait for an output 
(Mixed, L cpt ). 

This protocol is completed on the next page. 


Theorem 1. Let CS be a CCA2 secure cryptosystem. Then Protocol^securely realizes 
Functionality 0 with respect to static active adversaries that corrupt less than A of the 
mix-servers and any number of senders, provided that t is chosen such that H^ (t '- 1 is 
negligible, where H > 1 is the number of honest parties. 

Due to our conventions in Section 0| and the definition of Functionality 0 the theorem 
also captures the robustness of the protocol, i.e., it gives an output provided that at most 
min(A — 1, k — A) parties are corrupted. 
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Protocol 4 (Continued, including verifications.). 

7. Verifications. 

(a) Explicit Verification. Input Verify to 7r cev . If it outputs ( Culprit , d), then halt 
with this output, and otherwise let (Secret Key. sk cev ) be the output. 

(b) Replication Check. Hand Recover to 7-^ ks and wait for a response 
(, SecretKey , sk r ). Compute U = Dec s fcr (L cpt ) and let B be the ciphertexts 
in L cpt that do not have exactly t copies after decryption with si/ . 

(c) Backwards Tracing. Input (Trace B. B) to 7r cpt . If it outputs ( Culprit , d ), 
then halt with this output, and otherwise let ( TracedB,B ') be the out- 
put. Let L' be the list of all u g L in such that 7r cev on input 
(Ciphertexts, Sp\\t t (Dec s k°(u))) would output (Mixed, B") with B' fl 
B" ± 0. 

(d) Forward Tracing. Let F be the list such that 7r cev on input (Ciphertexts, L' o 
L"), where L" = Ofe[i,fe]Split t (Dec s i : o(u JV+ ^)), would give an output 
(Mixed, F ). Input (TraceF, F ) to 7r cpt . If it outputs (Culprit, d), then halt 
with this output. Otherwise, let (TracedF, F') be the output. 

8. Final Decryption. Hand Recover to and wait for a response 

(SecretKey, sk { ). Let U' = Unique(Dec s /, ; . (L cpt \ F')) and halt with output 

(Mixed, Sort(Dec s/ .f (L r/ ))) . 


7 Conclusion 

We construct a provably secure mix-net that unlike many other mix-nets in the litera- 
ture do not require any homomorphic properties from the cryptosystem. This is a clear 
advantage for those concerned that quantum computers can be constructed in the future. 
In contrast to the only previous proposed mix-net based on any cryptosystem 02, o ur 
construction enjoys not only provable security but also full privacy and correctness. Our 
mix-net is fast there are many senders and plaintexts are large. 
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A Agreement Subroutine 


Subroutine 1 (Agree (Tag)). 

1. Set J <— {1, . . . ,k}. 

2. While J ± 0: 

a) Wait for a message (Tag. rrij ) from the dummy mix-server A4 7 with j G J. 

b) Set J <— J\{j} and hand ( TagReceived , j, rrij) to S. 

3. Return the value in that has been submitted by A of the mix-servers. 

If no such value exists, hand _L to A4 and halt the main functionality. 


A Mix-Net from Any CCA2 Secure Cryptosystem 625 


B Functionalities Implemented by General MPC 


Functionality 4 (Key Generation with VSS). The key generation with VSS func- 
tionality ■7\J ts executing with dummy mix-servers M. and ideal adversary S proceeds 
as follows. 

1. Generate ( pk,sk ) = Gen(l"), hand ( PublicKey,pk ) to S and A4, and 
(, SecretKey , sk) to JAj, and wait until dummy mix-servers agree to recover, i.e., 
run Agree (Recover). 

2. Hand ( SecretKey , sk) to <S and M. . 


Functionality 5 (Distributed Key Generation with YSS). The distributed key 
generation with VSS functionality J rdk s executing with dummy mix-servers M. and 
ideal adversary S proceeds as follows. 

1. Generate ( pk , sk) = Gen(l"). 

2. Hand ( PublicKey , pk) to S and A4, and wait until dummy mix-servers agree to 
recover, i.e., run Agree(Recover). 

3. Hand ( SecretKey , sk) to S and M. 


Functionality 6 (Bulletin board). Executing with dummy senders P, dummy mix- 
servers M. and ideal adversary S, the bulletin board functionality P bb keeps a pri- 
vate and a publicQ database and proceeds as follows. 

1 . Upon receiving a message (Tag, m) from a party P £ P IJ Ad, hand (P, Tag, m) 
to S and write (P, Tag, to) on the private database. Ignore any further message 
(Tag, ml) from the party P. 

2. Upon receiving a message (P, Tag, m ) from S, see if (P, Tag, m) already ex- 
ists in the private database. If so, then write (P, Tag, m) on the public database. 
Ignore any further message (P, Tag, m) from S. 

“ The contents of the public database is known to all parites. In our protocols, parties need 
to wait until a specific party P publishes (Tag, m) on the bulletin board. This means that, 
they wait until (P, Tag, rn) appears on the public database. 


Functionality 7 (Proof of Correct Decryption). The proof of correct decryption 
functionality Pj ul executing with dummy mix-servers M. and ideal adversary S 
proceeds as follows. 

1 . Wait for an input (SecretKey, sk) from dummy mix-server Mj and then hand 
(SecretKey, j) to S. 

2. Wait for a common input (m,c) from dummy mix-servers, i.e., (m,c) = 
Ag ree(Verify), and send True or False to S and M. depending on if m = 
Dec.,/; (c) or not. 
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Abstract. The Fiat-Shamir transformation is the most efficient con- 
struction of non-interactive zero-knowledge proofs. 

This paper is concerned with two variants of the transformation that 
appear but have not been clearly delineated in existing literature. Both 
variants start with the prover making a commitment. The strong vari- 
ant then hashes both the commitment and the statement to be proved, 
whereas the weak variant hashes only the commitment. This minor 
change yields dramatically different security guarantees: in situations 
where malicious provers can select their statements adaptively, the weak 
Fiat-Shamir transformation yields unsound/unextractable proofs. Yet 
such settings naturally occur in systems when zero-knowledge proofs 
are used to enforce honest behavior. We illustrate this point by show- 
ing that the use of the weak Fiat-Shamir transformation in the Helios 
cryptographic voting system leads to several possible security breaches: 
for some standard types of elections, under plausible circumstances, ma- 
licious parties can cause the tallying procedure to run indefinitely and 
even tamper with the result of the election. 

On the positive side, we define a form of adaptive security for 
zero-knowledge proofs in the random oracle model (essentially 
simulation-sound extractability) , and show that a variant which we call 
strong Fiat-Shamir yields secure non-interactive proofs. 

This level of security was assumed in previous works on Helios and our 
results are then necessary for these analyses to be valid. Additionally, we 
show that strong proofs in Helios achieve non-malleable encryption and 
satisfy ballot privacy, improving on previous results that required CCA 
security. 


1 Introduction 

Zero-knowledge proofs of knowledge allow a prover to convince a verifier that 
she holds information satisfying some desirable properties without revealing any- 
thing else. To be useful, such proof systems should satisfy completeness (the 
prover can convince the verifier that a true statement is indeed true) and sound- 
ness (the prover cannot convince the verifier that a false statement is true). 
Zero-knowledge proofs can either be interactive or non-interactive; for the latter 
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the prover only sends his proof and the verifier decides to accept or reject the 
statement without any further interaction. 

The focus of this paper is on the most common and efficient construction 
of non-interactive proofs, namely the Fiat-Shamir heuristic [I] . Here, one begins 
with an interactive sigma protocol, a special type of three-move protocol in which 
the prover sends a commitment, the verifier answers with a random challenge 
and the prover completes the protocol with a response. The idea behind the 
transformation is simple and appealing: have the prover compute the message 
of the verifier as the hash of the message sent by the prover — if the hash is 
modelled as a random oracle the message computed this way should look random 
as in an interactive execution, hence the properties of the original proof system 
should somehow be preserved. 

The transformation appears in the literature in two different forms, depending 
on what is hashed. In the formalization of Bellare and Rogaway J2|, which we 
refer to as the weak Fiat-Shamir transformation (wFS), the hash takes only the 
prover’s first message as input. Other papers e.g. |.'il'lj suggest including the 
statement to be proved in the hash input. In the remainder of the paper we call 
this the strong Fiat-Shamir transformation (sFS). 

Contributions. The contributions of this paper fall into two main categories. 
First we identify weaknesses of the weak (sic!) Fiat-Shamir transformation and 
show that in applications it can be a serious source of insecurity. Secondly, we 
provide several positive results regarding the strong Fiat-Shamir transformation 
and its uses in applications. 

Insecurity of wFS and Attacks on Helios. Our first results show that the security 
proofs commonly given for Fiat-Shamir proofs do not hold when applied to weak 
proofs and when the prover can chose his statement (s) to prove adaptively. This 
may or may not render a protocol using them insecure, as a protocol may have 
other means of dealing with adaptivity. For example, in the original application 
to identification protocols, weak proofs are sufficient. 

As an example where weak proofs do not yield security, we consider Helios 
lolfil . a cryptographic voting protocol that has been used in practice. Versions 
of Helios have been employed, for example, for the election of the president of 
the Universite catholique de Louvain 0, the Princeton University Undergrad- 
uate Student Government [Zj and the board of the IACR |H|- We focus on the 
zero-knowledge protocols implemented since Helios 2.0 |Sj for elections based on 
homomorphic tallying, which are still used in the latest version of Helios as doc- 
umented on M a ! the time of writing. In brief, those elections work as follows. 
Trustees first jointly generate an election public key, using NIZK proofs to make 
sure that this key actually includes contributions from all trustees. Then, to cast 
a ballot, a voter encrypts a vote and attaches NIZK proofs that the vote is le- 
gal. All ballots are placed on a publicly readable bulletin board. Eventually, the 
election administrators homomorphically add all ballots, decrypt the result and 
use NIZK to prove the correctness of their actions. The encryption scheme is ex- 
ponential ElGamal and the particular NIZKs involved are obtained by applying 
the weak Fiat-Shamir transformation to the Schnorr m Chaum-Pedersen d 
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and disjunctive Chaum-Pedersen protocols (and variants thereof). These proofs 
are used to guarantee that the privacy of the votes rests on all trustees, to en- 
force that voters create ballots containing valid votes and to prevent dishonest 
administrators from claiming a wrong result. We show that the use of the wFS 
transformation is the source of three types of insecurity: 

a) breaking verifiability by allowing colluding administrators to cast a single 
ballot that is not well-formed and contains any chosen number of votes for a 
specific candidate, 

b) breaking liveness of the system by allowing colluding administrators to fail 
providing the election outcome while proving that they behave honestly, or 
by allowing voters to cast a random vote which leads to tallying taking su- 
perpolynomial time, and 

c) breaking privacy by allowing the casting of related ballots that do not contain 
mere copies of previously submitted ciphertexts. 

The first two of these attacks are undetectable under normal circumstances. 

While our focus is on Helios which is our motivating application, in the full ver- 
sion of our paper we also show attacks against schemes constructed via the Naor- 
Yung paradigm and via the encrypt-then-prove construction: when using proofs 
derived through wFS these constructions may yield malleable encryption schemes. 

Security of Strong Fiat-Shamir and Applications. The problems that we have 
identified in the use of the wFS do not apply to proofs obtained through the 
strong version of the transformation. It is then natural to ask what level of 
security does one get from these proofs. We provide several results. First, we 
formulate a security notion for non-interactive zero-knowledge proofs of knowl- 
edge which captures adversaries that can choose their statements adaptively. In 
essence, this notion is the analogue of simulation-sound extractability defined by 
Groth in the common reference string model m Informally, a malicious prover 
is allowed to see simulated proofs (of potentially fake statements) and aims to 
provide valid looking proofs for adaptively chosen statements in such a way that 
an extractor cannot obtain witnesses. Interestingly, our definition is not simply a 
rehashing of the notion in M In the random-oracle model, extraction requires 
the rewinding of the prover (as opposed to merely using a trapdoor) and in turn, 
this implies complex interaction between the adversary, the simulator and the 
extractor. We then show that applying sFS to U-protocols results in protocols 
that are simulation-sound extractable. Our result seems to be the first thorough 
investigation on the precise security guarantees offered by such proofs. 

As a first application of this result, we investigate the security of non-malleable 
encryption schemes that are built by combining an IND-CPA encryption scheme 
with a proof of knowledge of the randomness used in the encryption process. 
We refer to this construction as the Enc+PoK approach. A well-known instanti- 
ation is the TDHO scheme introduced and studied by Shoup and Gennaro Ml- 
Intuitively the construction should achieve IND-CCA security but so far, all at- 
tempts have failed to confirm or disprove this under natural assumptions (e.g., 
DDH in the random oracle model) |14ll5j . As a consequence, the form of non- 
malleability ensured by Enc+PoK schemes is, surprisingly, still unknown. We 
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provide a lower-bound on the answer to this question: if the proof of knowledge 
used in the encryption process is simulation-sound extractable, then the result- 
ing scheme is NM-CPA secure. An immediate corollary is that the TDHO scheme 
is NM-CPA secure in the random oracle model under the DDH assumption. 

We then turn to the analysis of ballot privacy in Helios. Prior work shows that 
ballot privacy is guaranteed if the encryption scheme used in the construction is 
IND-CCA |1 dll 71 . Since ballots in Helios use the Enc+PoK paradigm (which, as 
discussed above, is not known to be IND-CCA) a natural suggestion is then to 
replace it with something stronger. For example, Bernhard et al. suggested ap- 
plying the Naor-Yung transformation to the underlying ElGamal encryption Id. 
while Bulens et al. used a variant of the TDH2 scheme M- These modifications 
both substantially increase the computational costs of the system and require 
major changes in the implementation. 

Our final result is to show that although the NM-CPA notion is strictly weaker 
than IND-CCA |19j . it is sufficient to ensure ballot privacy. In particular a minor 
tweak of the Enc+PoK construction currently used in Helios where we replace 
wFS with its strong counterpart and check for repeated ciphertexts is sufficient. 
The change that we require is easily accomplished by including additional el- 
ements in the inputs of the hash function and preserves the current level of 
efficiency. 


2 The Fiat-Shamir/Blum Transformation 

In this section we introduce the two variants of the Fiat-Shamir heuristic that we 
analyze. We start by fixing notation and recalling some standard notions. In the 
following we let R C V({0 : 1}* X (0, 1}*) be an efficiently computable relation. R 
defines a language jCr = {Y G {0, l}*|3u; : R(w, Y)} in NP. We further assume 
that there is a well-defined set A D C decidable in polynomial timeQ 

A non-interactive proof system for language Cr is a pair algorithms (Prove, 
Verify). Such a proof system is complete for C R if for every ( w,Y ) G R, with 
overwhelming probability if 7r <— Prove(iu, Y) then Verify (Y, n) = 1. We define 
soundness of such proof systems (the property that a cheating prover cannot 
make the verifier accept a false statement) later in the paper. Here we recall the 
notion of zero-knowledge in the random oracle model 1213 - 

In this setting, a simulator S for a proof system is an algorithm in charge of 
answering random oracle queries and producing valid proofs for any statement 
Y G A with respect to this oracle. In particular, it can “patch” the oracle to 
create its simulated proofs. Such a simulator responds to the following queries: 

1 Suppose that C is the set of DDH triples (G a ,G b ,G ab ) over some group G. Then 

A could be G 1 * 3 . The reason for defining this formally is that we will later expect 
our zero- knowledge simulator to produce valid “proofs” for some “false statements” , 

but which ones? Can it produce a proof for the statement consisting of the empty 
string, for example? We use A as the class of statements on which the simulator can 
produce proofs. 


630 D. Bernhard, O. Pereira, and B. Warinschi 


'H(s) S maintains a list of oracle query /response pairs. For repeated queries, S 
answers consistently; for fresh queries, S draws a random value r, adds (s, r) 
to its list and returns r. 

Simulate(Y) For Y £ A, the simulator returns a proof n such that Verify(Y, 7r) = 
1 if the verifier uses the simulator for its oracle queries. S can add 
query /response pairs to its oracle list to process a simulation query. 

Definition 1 (Zero-Knowledge). A proof system is zero-knowledge if there 
is a simulator S such that no adversary who can make queries to the random 
oracle and queries of the form create-proof(u;, Y) can distinguish the following 
two settings with non-negligibly better than 1/2 probability. 

1. Random oracle queries are answered by a random oracle. In response to 
create-proof(u;, Y), the challenger checks that R(w,Y). If not, he returns _L. 
Otherwise, he returns Prove(u;, Y). 

2. The challenger runs a copy of the simulator S. It forwards random ora- 
cle queries to S directly. For create-proof/uj, Y), the challenger checks if 
R(w,Y) holds: if not, the challenger returns _L; if it holds, the challenger 
sends Simulate(Y) to <S and returns the result to the adversary. 

Sigma Protocols. A sigma protocol for a language Cr is a protocol for two parties, 
a prover and a verifier. Both share a statement Y £ Cr as input and the prover 
may additionally hold a witness w. 

The prover begins by sending a value A known as the commitment. The 
verifier replies with a challenge c drawn uniformly from a fixed challenge set. 
The prover finishes the protocol with a response / whereupon the verifier applies 
a deterministic algorithm Verify to Y, A, c and / which can accept or reject this 
execution. 

A sigma protocol is correct (w.r.t. Cr) if the prover, on input a pair (w. Y) 
satisfying R and interacting with the verifier who has input Y, gets the verifier 
to accept with probability 1. 

A sigma protocol has special honest verifier zero knowledge if there is an 
algorithm Simulate that takes as input a statement Y £ A, challenge c and 
response / and outputs a commitment A such that Verify (Y, A, c, f) = 1 and 
furthermore, if c and f where chosen uniformly at random from their respective 
domains then the triple (A, c, /) is distributed identically to that of an execution 
between the prover and the verifier. Notice that the verifier is supposed to work 
with statements that may be false. 

A sigma protocol has special soundness if there is an algorithm Extract that 
takes as input a statement Y and any two triples ( A,c,f ) and (A! ,d,f) such 
that both verify w.r.t. Y, A = A' and c / d , and returns a witness w such that 
R(w,Y). 

The Fiat-Shamir Transformation. The Fiat-Shamir transformation [Q (which 
Pj attributes to Blum) is a technique to make sigma protocols non-interactive 
using a cryptographic hash function. There are two commonly used descriptions 
of this technique that we call weak and strong Fiat-Shamir and which we describe 
together in the following definition. 
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Definition 2 (Fiat-Shamir Transformation). Let E = (Proven, Verify s ) be 

a sigma protocol and H. a hash function. The weak Fiat-Shamir transformation 
of E is the proof system wFS n{E) = (Prove, Verify) defined as follows: 

Pro \/e(w,Y) Run Pro ves(w,Y) to obtain commitment A. Compute c <- H(A) . 
Complete the run of Prover^ with c as input to get the response f. Output 
the pair (c, /) . 

Verify (Y, c, /) Compute A from (Y, c, f), then run Verify S (Y, A, c, /). 

The strong Fiat-Shamir transformation of E, i.e., sFS(V) = (Prove, Verify) is 
obtained as above with the difference that c is computed by c 4— H.(Y,A). 

3 Pitfalls of the Weak Fiat-Shamir Transformation 

We now describe various standard protocols in which the use of the weak Fiat- 
Shamir transformation can have undesirable effects. We illustrate these effects 
through several new practical attacks on various components of the Helios voting 
system, which relies on these protocols. 

SCHNORR PROOFS. The Schnorr [Ej signature scheme is the weak Fiat-Shamir 
transformation of the Schnorr identification protocol. In a group G of order q 
generated by G, it proves knowledge of an exponent x satisfying the equation 
X = G x for a known X. Viewing (x,X) as a signing/ verification key pair and 
including a message in the hash input yields a signature of knowledge. 

To create a proof, the prover picks a random a «— 1> q and computes A = G a . 
He then hashes A to create a challenge c = 'H(A). Finally he computes / = a+cx ; 
the proof is the pair (c, /) and the verification procedure consists in checking 
the equation c = H( §^). 

The weak Fiat-Shamir transformation can safely be used here, as discussed 
in previous analysis Jl(121j , since the public key X is selected first and given as 
input to the adversary who tries to produce a forgery. 

However, if the goal of the adversary is to build a valid triple (X, c, /) for any 
X of his choice, then this protocol is not a proof of knowledge anymore unless 
the discrete logarithm problem is easy in G. Suppose indeed that there is an 
extractor /C that, by interacting with any prover V that provides a valid triple 
(X, c, /), extracts x = log c (Y). This extractor can be used to solve an instance 
Y of the discrete logarithm problem with respect to (G, G) as follows: use Y as 
the proof commitment, compute c = H(Y), choose f Z q and .set X — 

Since the proof ( Y,c,f ) passes the verification procedure for statement X , the 
extractor 1C should be able to compute x = log G (V) by interacting with our 
prover. We now observe that, by taking the discrete logarithm in base G on 
both sides of the definition of X, we obtain the solution log c (Y) = / — cx to 
the discrete logarithm challenge. 

Application to Helios. Schnorr proofs are used during the key generation pro- 
cedure of Helios as a way to prevent trustees from choosing their public key as 
a function of the public key of the other trustees, which could give them the 
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possibility to select the election private key at will and to decrypt all individual 
votes E23- While the scenario above shows that trustees who publish a public key 
together with a Schnorr proof for that public key do not necessarily know the 
corresponding private key, the fact that our scenario does not allow the prover to 
choose his statement (but just to compute it as a function of the elements of the 
proof) does not seem to give rise to any practical attack. These weak Schnorr 
proofs would, however, break the proof of ballot privacy that we give later in 
this paper (assuming strong proofs). 

Chaum-Pedersen Proofs. Chaum and Pedersen GH introduced a proof of dis- 
crete logarithm equality, which they make non-interactive using the strong form 
of the Fiat-Shamir transformation. More precisely, given two group elements 
(G,X), a prover who knows the discrete logarithm x = log G (X ) can prove that 
two group elements (R, S) satisfy the relation log G (X) = log Jl (5) as follows. He 
picks a random a <— Z 9 , computes A = G a , B = R a , c = R(R,S,A,B) and 
f = a + cx. The proof is the pair (c, /) and the verification procedure consists 
in checking the equation c = H(R. S. 

We observe that this proof is not sound anymore if it is used as a proof 
that three elements ( X,R,S ) are such that log G (X ) = log R (S'), that is, if the 
prover also has the possibility to choose X in the process of building his proof. 
Indeed, a prover could select ( a,b,r,s ) <- Z^ at random, compute A = G° , 
B = G b , R = G r and S = G s from which he can compute c = H(R. S, A, B ) and 
/ = He now completes the proof by computing x = (/ — a)/ c and setting 
X = G x . Now, we observe that log G (X) = 7 + fe ~° r while log i? (S l ) = which 
differ with overwhelming probability. 

Application to Helios. Chaum-Pedersen proofs instantiated with the weak Fiat- 
Shamir transformation (that is, c = H{A, B)) are used during the ElGamal 
decryption procedure of Helios, in order to demonstrate that the decryption of 
the product of the votes that is computed by the trustees is consistent with the 
public key. More precisely, given a public key X and a ciphertext ( R,S ) that 
encrypts the sum of all votes, a trustee is required to compute T = R x where 
x = log G {X ) and to publish it together with a Chaum-Pedersen proof that 
\og G (X) = log R (S). The ElGamal decryption is then computed as log G (5/T). 

In this proof, a malicious trustee does not have the possibility to choose his 
private key at decryption time, but has the possibility to select T as part of 
the proof computation process. He can do so as follows. Select (a, b ) <— Z q at 
random, compute the proof commitments A = G a and B = G b , the challenge 
c = H(A , B) and the response f = a + cx. Eventually, compute the decryption 
factor T = (^g-)c . It is easy to verify that the proof (c, /) is valid for the tuple 
( G,X,R , S), but that \og R (T) = x + ar ~ b , which will be different from x with 
overwhelming probability. As a result, the decryption procedure will provide an 
aberrant result: an essentially random element of Z g . This strategy provides a 
way to build a denial of service attack against a Helios election, without anyone 
being able to detect who was responsible. 
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A more dangerous attack can be mounted if we assume that the trustees have 
the possibility to passively eavesdrop on the randomness of all voters. Though 
demanding, such an attack is still easier to mount and harder to detect than a 
full active attack. We would expect the impact of such a scenario to be “only” 
a complete loss of privacy by the voters but we show that it actually provides a 
way for the trustees to announce any election outcome of their choice as soon as 
they can actively corrupt a single voter (which can happen simply if a trustee is 
a voter himself). 

Consider an election with trustees who would like to announce the election 
outcome to. These trustees select a private key x and publish the public key 
X. They also select (a, b) <- l? q , compute A = G a , B = G h , c = H(A,B) and 
/ = a + cx. Then all the voters submit their votes, except the corrupted one who 
waits until the last minute of the election. At that time, the trustees compute the 
product of all encrypted votes that have been submitted and obtain a ciphertext 
( R',S ') = ( G r ,G m ' ■ G xr ) for some values r' and to' that they can compute 
using the randomness of the voters. They now compute r = h + c j '™- m > , as we ll 
as a ciphertext (G r ~ r ’ , G x ^ r ~ r '^) which is an encryption of 0 for which they can 
compute a proof of validity since they know r — r 1 . This ciphertext and proof 
are submitted by the corrupted voter, with the effect that the product of all 
encrypted votes is (R,S) = (G r , G m ■ G xr ). It can now be verified that (c, /) 
form a valid proof that log G (A') = log fl (^), which indicates that m is the 
outcome of the election. 

Disjunctive Chaum-Pedersen Proofs. Disjunctive proofs allow proving 
that one of two statements holds without revealing which one is correct. These 
proofs have numerous applications. For instance, they can be used by a voter to 
demonstrate that a ciphertext he produced is an encryption of either 0 or 1 (but 
nothing else) , expressing whether or not he supports a candidate. 

Suppose that a voter builds an exponential ElGamal ciphertext (R, S) with 
respect to public key X and wants to prove that it encrypts 0 or 1. We consider 
the case where it is an encryption of 1 (the other case is similar). First, the voter 
simulates a proof that log fl (5) = x by selecting a random proof (co,/o) <— 
and computing Ao = G^°/R c ° and Bq = Xf°/S c °. Then he selects a± <— Z q , 
computes A\ = G ai , B\ = X ai , c = %{Aq, Bo, A\, Bi), c\ = c — Co and /i = 
oi+cir. The proof consists of (co, ci, /o, /i) and verification consists of verifying 
whether c 0 + Cl = ^g*r). 

Application to Helios. The proof we just described is exactly the one used in 
Helios to guarantee that voters encode at most one vote for each candidate and 
it exhibits weaknesses that are similar to those described above, but with an even 
more dangerous effect. Consider an election organized by corrupted trustees who 
would like to influence the election outcome by adding (or removing) to approvals 
to a candidate of their choice. These trustees now have the freedom to choose 
any public key and ciphertext of their choice that would allow them to compute 
an encryption of m and to prove that it is an encryption of 0 or 1. They can 
achieve this as follows. 
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They first select (ao, bo, oi, bi) <— h q , from which they compute the com- 
mitments Ao = G a ° , Bo = G b ° . A\ = G a ' , B\ = G hl , the challenge c = 
H(A 0 ,B 0 ,Ai,Bi) and the private key x = trfi-mj-avn ™ an< ^ ^he public 
key X = G x . Using this public key, they select a random encryption of m by 
selecting r <— Z q and computing ( R,S ) = (G r , G rn, X r ). Eventually, they com- 
pute the challenges ci = and Co = c — ci and the responses /o = «o + Co r 

and /i = or + cir. It can be verified that (co, ci, /o, /i) form a proof that (R. S ) 
encrypt 0 or 1, while it actually encrypts an arbitrary to. 

Furthermore, it can be observed that this proof, like the others that we pre- 
sented above, is indistinguishable from a regular one. 

Other attack possibilities exist, based on the same techniques. For instance, 
a voter who does not know the election private key can build a ciphertext that 
encrypts a random value in Z q and prove that it encrypts 0 or 1, which would 
make the decryption procedure fail. We do not know however whether it is 
possible to build such a proof in a way that is indistinguishable from a regular 
one0 

Encrypt + PoK. Adding a proof of knowledge of the plaintext/randomness to 
a ciphertext in an IND-CPA secure public key encryption scheme is a common 
way to yield a non-malleable encryption scheme^ We formalise this construction 
and show that using wFS does not yield non-malleable encryption. 

Definitions (Encrypt+PoK). Let <£ = (KeyGen, Enc, Dec) be a public-key 
encryption scheme. Let R((m,r),(Y,pk)) := ( Y = Enc (pk,m;r)) be the relation 
that Y is an encryption of m with randomness r for public key pk, let A be the 
ciphertext space (or some suitable superset thereof) and let = (Prove, Verify) 
be a NIZK-PoK for this relation. 

The Encrypt+PoK transformation is the following encryption scheme. 

KeyGen' Run KeyGen. 

Enc '{pk, to) Draw some randomness r and create a ciphertext E = Encfpk, to; r). 

Create a proof n Prove(pfe, E, to, r). The ciphertext is the pair ( E,n ). 
Dec (sk,E, 7r) First run Verify (p/c, E, n). If this fails (returns 0), output J_ and 
halt. Otherwise, return Dec (sk,E). 

Consider the ElGamal encryption scheme with weak Schnorr proofs of the ran- 
domness used for encryption (which would allow one to extract the message), 
which would be a weak variant of the TDHO scheme m In other words, a 
ciphertext for a message M under public key X is ( G r ,M ■ X r ,c,f ) where 
c = H(Gf / (G r ) c ). We can rerandomise such a ciphertext (R. S, c, /) by picking 
a random u and setting the new ciphertext to be (R- G U ,S ■ X u , c,f + cu). The 
new plaintext is the same as the old one as S/R x = M- X r+U / (G r+U ) x = M and 
the proof still verifies as c = TL ( 0r+u‘y. ) = R ■ Clearly, this encryption 

scheme is malleable. 

2 Our current technique involves setting co = 0. While such a ballot passes the current 
Helios verifier, this could be detected in an audit. 

3 Though the exact form of non-malleability that is provided is unclear HHT 
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Application to Helios. The same rerandomisation technique can be applied to 
current Helios ballots, giving a ballot privacy attack in the style of Cortier and 
Smyth [SI (based on the same principles as the attacks described in |24I25| .1 He- 
lios ballots contain ElGamal ciphertexts (R, S ) with disjunctive Chaum-Pedersen 
proofs (co, ci, /o, /i). To rerandomise such a ciphertext, pick a random u and set 
R' = R ■ G u , S' = S ■ Y u , /o = /o + colt and /{ — fi + c\u. Unlike previously 
known rerandomisation techniques, this one does not make use of a repeated El- 
Gamal ciphertext or proof. It can be detected however by checking for repeated 
hash values, just as for the previous attacks. 

Further examples. The various attacks that we described above focus on ap- 
plications to the Helios voting system, which uses the weak Fiat-Shamir trans- 
formation in all proofs. We believe that these examples provide clear evidence 
that the weak Fiat-Shamir transformation should not be used in that context: 
in particular, we showed that malicious authorities can arbitrarily influence the 
outcome of an election, which is in clear contradiction with the universal verifia- 
bility properties expected from that system. In the next sections, we will focus on 
the properties of the strong Fiat-Shamir transformation and show the benefits 
that its adoption would provide for the Helios system. 

We stress that there are various other contexts in which the weak Fiat-Shamir 
transformation should not be used. For instance, similarly to our observation for 
the weak variant of the TDHO scheme, the scheme resulting from the Naor-Yung 
transformation j2Hl applied to ElGamal encryption may become malleable if the 
weak Fiat-Shamir transformation is used, contradicting the level of desired secu- 
rity. We provide attacks against a concrete instantiation of that transformation 
in the full version of our paper. 

4 Simulation Sound Extractable Proofs 

The examples discussed in the previous section show that the wFS transform 
fails to offer even the most basic soundness properties in many contexts. We 
now investigate the soundness properties of the sFS transform. More precisely, 
we formulate the notion of simulation sound extractable proofs in the random 
oracle model and show its applications to the sFS transformation. Our definition 
draws inspiration from that of witness-extended emulation j2] in which the exis- 
tence of an extractor is demanded such that for any adversary returning a vector 
of statements and proofs, the extractor returns identically distributed elements 
along with the witnesses to the proven statements. However, the definition is per- 
haps more appropriately viewed as the analogue definition of simulation sound 
extractability defined by Groth m which combines the simulation soundess 
approach of Sahai 123, with proofs of knowledge m 

We consider a malicious prover who may ask to see simulated proofs (as in 
simulation-soundness). The extractor that we consider gets the transcript of a 
run of the prover where the prover outputs several valid proofs together with 
the transcipt of random oracle queries. His goal is to extract witnesses of these 
proofs. In the process, we allow the extractor to invoke and communicate with 
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copies of the prover that use the same randomness as the run it is trying to 
extract from. This ability is what permits the knowledge extractor to fork the 
prover’s execution without giving the extractor access to the coins of the prover Q 
A bit more precisely, a malicious prover for a proof system ip = (Prove, Verify) 
is an algorithm A that expects access to two oracles: a hashing oracle and a 
simulation oracle. Thus A may submit some string s and expects H(s) in return 
and it may also make simulation calls Simulate(Y) for any statement Y £ A and 
expects to obtain a proof n such that Verify(Y, 7 r) = 1. The prover returns a pair 
of vectors (Y, it). 

Definition 4 (Simulation Sound Extractability). Let ^ be a zero- 
knowledge proof system with simulator S. We say that *p is simulation sound 
extractable (SSE) if there exists an extractor K such that for every prover A, 1C 
wins the following game with non-negligible probability. 

1. (Initial run.) The game selects a random string uj for A. It runs an instance 
of A with the simulator S until A makes his output and halts. If A does 
not output any proofs, any of the proofs do not verify (w.r.t. the instance 
of <S used as the random oracle) or any of A’s statement /proof pairs (Y, ir) 
is such that 7r was the result of a Simulate(Y) query, then 1C wins the game 
directly. 

2. (Extraction.) The game runs an instance of 1C, giving it the transcript of all 
queries in the initial run and the produced (Y. n) as input. 1C may repeatedly 
make one type of query invoke in response to which the game runs a new 
invocation of A on the same randomness u> that it chose for the initial run. 
All queries made by these instances are forwarded to 1C who can reply to 
them. 

3. 1C wins the game if it can output a vector of witnesses w that match the 
statements Y of the initial run, i.e. for all i we have R(wi, Yi). 

The following theorem confirms that the strong Fiat-Shamir transformation 
yields proof systems that satisfy the notion we described above. 

Theorem 1. Let E be a sigma protocol with a challenge space that is expo- 
nentially large in the security parameter, special soundness and special honest 
verifier zero-knowledge. Then sFS(X') is zero-knowledge and simulation sound 
extractable with respect to expected polynomial-time adversaries. 

Applications. The Schnorr and Chaum-Pedersen protocols are clearly both sigma 
protocols with special soundness and special honest verifier zero knowledge so 
Theorem Q] applies and the sFS versions of these protocols are SSE proofs. For 
disjunctive Chaum-Pedersen, the challenge is the actual c obtained from the 
verifier and the response is the tuple f = (fo, fi)- This is a sigma protocol with 
special soundness and almost special honest verifier zero knowledge — almost, 
because in our definition the simulator chooses c, / independently and uniformly 

4 Although not necessary for this paper, hiding the adversary’s randomness from the 
extractor can be helpful in other contexts to prove separation results. 


How Not to Prove Yourself 637 


at random yet if c ^ Co + Ci then the resulting proof will not verify, patched 
oracle or not. We could fix this by not sending ci in the response and having the 
verifier recompute ci = c — Co- We will ignore this point as it is easy to see that, 
if the simulator chooses c, / at random and then adjusts Co, all relevant theorems 
still hold. In particular, the sFS transformation of disjunctive Chaum-Pedersen 
is still a simulation-sound extractable proof. 

Encrypt + PoK. 

With this notion we can restore the folklore result that appending a PoK to 
an IND-CPA scheme gives a NM-CPA one, if the PoK is simulation-sound ex- 
tractable. For space reasons the proof is only in the full version of our paper. 

Theorem 2. Let £ be an IND-CPA secure encryption scheme and ^ be a simul- 
ation-sound extractable NIZK-PoK for the encryption relation. Then £<p is non- 
malleable (NM-CPA,) secure with respect to expected polynomial-time adversaries. 

5 Ballot Privacy in Helios 

In this section we propose a modification to Helios and prove that it satisfies 
ballot privacy in the model of single-pass voting of Bernhard et al. m 
Single-Pass Schemes. A single-pass voting scheme is a protocol consisting of 
the following algorithms and execution protocol for a set V of voters, a set T of 
trustees and a bulletin board B. The class of single-pass schemes includes not 
only Helios |S| but also several other cryptographic voting schemes |2DI3 1)12713 1 \ . 
Single-pass voting models two of the most popular approaches to cryptographic 
voting, homomorphic tallying and mix-nets. Voters need only read a single mes- 
sage off the board (the election specification and public keys) and post a single 
message (their ballot) in return. We assume some underlying voter authentica- 
tion mechanism0 

Setup(l A ) is an algorithm to create public parameters for the election and secret 
ones for all trustees. 

Setup produces one public output Y known as the public key of the election 
and a secret output Xi for each trustee 7) in the set of trustees T. The secret 
outputs of all trustees together are known as the secret key of the election. 
Vot e(id,v,Y) is a probabilistic algorithm run by voters to prepare their votes 
for submission. It takes as input a voter’s identity id, a vote v and public 
information Y and outputs a ballot s <- Vot e(id,v,Y). 

Validate^, s) models the algorithm run by the bulletin board during voting. Its 
inputs are the current board state b and the submitted ballot s. It returns 1 
if the submission is deemed valid (given the current state of the board) and 
0 otherwise. 

5 In the election of the president of UC Louvain |§| using Helios, authentication was 
handled by the university’s existing infrastructure. As such it escapes cryptographic 
modelling and we choose not to model authentication (in particular we do not wish 
to assume a PKI) . 


638 D. Bernhard, O. Pereira, and B. Warinschi 


Tally(6) is a tallying protocol that is run by the trustees. Its inputs are the board 
so far and the private data kept by the trustees from the setup phase. 
Result(fc) is a deterministic algorithm that takes a bulletin board b of a completed 
election and returns the result of the election, or a special symbol _L if the 
board does not contain a valid result. 

A single-pass scheme is executed as follows. 

1. Setup phase. The trustees run the Setup algorithm and post the public key 
Y to the bulletin board. 

2. Voting phase. Each voter may proceed as follows or abstain. He reads public 
key Y off the board and computes a ballot s <— Vote(?VI, v. Y) where id is his 
identity and v is his vote, and submits s to the board. 

The board runs Validate^, s) on every submission it receives and appends 
valid ones to its state. 

3. Tallying phase. The trustees run the Tally protocol and may post to the 
board. 

A single-pass protocol is correct w.r.t. a result function p if as long as everyone 
follows the protocol, with overwhelming probability (in the security parameter) 
none of the algorithms abort, Result returns a result when executed on the board 
at the end of the tallying phase and this result corresponds to p evaluated on 
the votes cast by the voters @ 

Ballot Privacy. We base our definition of ballot privacy on previous work 
in this area by Bernhard et al. 1 1 71321 . Ballot privacy is defined by means of a 
cryptographic indistinguishability game. The new feature of our definition is that 
it can deal with dishonest trustees; we introduce a simulator to handle tallying 
in this case. 

Definition 5 (Ballot Privacy) . A single-pass protocol for n trustees and any 
number of voters has ballot privacy against up tom < n dishonest trustees if there is 
a simulators such that for any efficient adversary A, the advantage Pr [A wins ] — 
1/2 against the following indistinguishability game is negligible (as a function of the 
security parameter) . The simulators is given black-box access to the adversary A 
and may invoke further copies of A using the same randomness as was used in the 
main run in the security game. We assume static corruption of trustees: the sets of 
honest and dishonest trustees are fixed in advance. The adversary can adaptively 
choose voters to be honest or dishonest, however. 

Setup Phase. The challenger picks a bit ■< — {0, 1} uniformly at random. He 
sets up two bulletin boards C and 7 Z. The adversary is given access to either 
C if /3 = 0 or to TZ if /3 = 1. 

The trustees jointly run the Setu p protocol, the challenger playing the honest 
trustees and the adversary, the corrupt ones. This produces some output Y on 
the visible board. The challenger then copies Y from the visible board to the 
hidden one. If the setup phase fails to complete, the adversary loses the game. 

6 One may also want p to operate on (v, id) pairs: in the Helios election at UC Louvain, 
votes from students, faculty and staff were weighted differently. 
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Voting Phase. The adversary may make two types of queries. 

Vot e(id,ve,vn) Queries. The adversary provides a voter identity id and 
two votes (vc,v-]z)- The challenger runs be Vot e(id,ve,Y) and bn <— 

Mote{id,vu,Y), where Y is the public key of the election that can be 
computed using Keys on the public information on the board from the 
setup phase. 

The ballots be and b-jz are submitted to the corresponding boards which 
process them normally (run Validate and append the ballot if it passes 
validation) . 

Balloted, b) Queries. These are queries made on behalf of corrupt voters. 
Here the adversary provides a ballot b. The challenger first submits b to 
the board visible to the adversary, which validates it and appends it if 
validation is successful. If the ballot successfully validates on the visible 
board, the challenger also submits the ballot to the invisible board which 
again validates the ballot and appends it if successful. 

Tallying Phase. If the adversary sees the £ board [fi = 0) then tallying can 
take place as normal. The trustees execute the Tally protocol, the challenger 
playing the honest ones and the adversary, the dishonest ones. 

If the adversary sees the 1Z board, the challenger starts up the simulator S 
and passes it both the £ and H boards and the state of the honest trustees. 
In the random oracle model, the simulator is responsible for the random 
oracle from this point onwards (and gets a list of all previously queried 
input/output pairs). The simulator acts on behalf of the honest trustees 
from now onwards and may post to the board. 

At the end of the game the adversary may make a guess of /3 and wins if his 
guess is correct. 

We propose to fix Helios by changing all proofs to their strong counterparts. 
This allows us to state the following theorem. The proof along with a detailed 
description of modified Helios in the single-pass model can be found in the full 
version of our paper. 

Theorem 3. In the random oracle model, the modified Helios (using strong 
proofs) satisfies ballot privacy against up to m = n — 1 dishonest trustees, as- 
suming that DDH is hard in the underlying group. 

6 Conclusion 

The prominence of Helios (it has been used in several real elections, notably in 
the election of the IACR board of directors) justifies the level of attention it 
has recently received. Results are divided between finding attacks against ballot 
privacy (e.g. the method of casting related ballots |23133j which we further refine 
in this paper) and proposing modifications that enable rigorous security proofs 
|23I17I34| . Our paper seems to be the natural convergence point. We identify the 
use of weak Fiat-Shamir proofs as a source of attacks much stronger than all those 
previously proposed: we have presented new and unforeseen consequences of 
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these weak proofs and we have shown that switching to their strong counterpart 
allows for a proof of ballot secrecy for Helios, and provides a crucial assumption 
on which existing verifiability analyses of Helios rely m In the process, we 
have made several conceptual contributions: we have defined simulation sound 
extractability in the random oracle model, proved that the strong Fiat-Shamir 
transformation yields secure non-interactive zero- knowledge proofs of knowledge, 
and justified the new notion through applications that include the Enc+PoK 
paradigm. 

In the remainder of this section we discuss two points that naturally arise 
from our work. 

Usability of wFS. Our results discourage the use of wFS proofs as they may lead to 
failures in the systems that employ them. Nonetheless, the transformation works 
well for its original application (and its generalizations m in constructing signa- 
ture schemes from identification schemes Q , since the statement (essentially the 
verification key) is fixed in advance. It is interesting to find other settings where 
wFS can actually be used safely. An intriguing possibility is to exploit malleability 
of wFS proofs as, for example, in the recent work of Chase et al. m that relies on 
controlled malleability of (standard model) non-interactive zero- knowledge proofs. 
A necessary first step in this direction is understanding precisely what is the level 
of malleability of wFS proofs, which we leave for further work. 

Practical impact of our attacks. As Helios in its current form has been used in 
real elections, a discussion of the impact of our attacks in practice is in order. 
We note that our attacks have been tested and succeeded on the current version 
of the Helios system on http://vote.heliosvoting.org 

Our denial of service attacks may only have an impact on future elections: 
as far as we know, all Helios elections led to the successful computation of a 
tally. Regarding our attack on privacy, the scale and outcome of all known real- 
world elections based on Helios rule out the possibility of effectively violating 
the privacy of voters through ballot copying. We also checked the 2010 IACR 
bulletin board and verified that it does not contain any copied ballot. 

Our most realistic new attack challenges the verifiability of elections: we 
showed that corrupted authorities colluding with a single voter can submit an 
encryption of an arbitrary (positive or negative) number of approvals for any 
candidate, and that this encryption is indistinguishable from a normal one. This 
attack could have a decisive impact on approval elections, where the addition of a 
reasonable number of votes for a single candidate can easily remain undetected. 

Many important Helios elections did not use approval voting, though (e.g., the 
UCL president election and the IACR 2010 election): in those elections, voters 
were only allowed to select a limited number of candidates. The capability to 
submit a single malicious ciphertext has a much more limited effect in that 
case, due to the need to produce an overall proof of the validity of the ballot 
besides the individual 0/1 proofs. In this context, two possibilities are left to an 
attacker: either (1) cheat on an individual proof, that is, if allowed to choose up 
to n candidates, encrypt n votes for a single candidate, 0 for all others, and the 
overall proof could still be built normally; or (2) cheat on the overall proof, that 
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is, select as many candidates as desired and fake the overall proof. The result 
of these limited manipulations could not have changed the outcome of the two 
particular elections mentioned above. 

Extending our attack to more than a single ciphertext does not seem immedi- 
ate. Indeed, our attack requires selecting the private key as a function of the hash 
of all the commitments in one proof. As a result, building two proofs based on 
different commitments would require using different election keys, which would 
not be possible in a single election. 

Our second most damaging attack relies on authorities that gain access to the 
randomness that is used by all voters in order to encrypt their messages. This 
could possibly be achieved by hiding a function that sends this randomness in the 
JavaScript code sent by the Helios server to the voters for the ballot preparation, 
or by forcing server-side encryption. Though more demanding, the effect of this 
attack can also be more severe as the single actively corrupted voter now only 
needs to submit a regular ballot. 

In all cases, including for approval elections (such as the 2011 IACR election) 
for which our first attack on verifiability applies, there remain possibilities to 
remove the concerns that our attacks may raise. For instance, a (possibly inde- 
pendent) set of trustees could be asked to run a mixnet on the ciphertexts posted 
on the bulletin board of the considered elections, which could then be followed 
by the individual decryption of all shuffled ballots. An invalid ballot would then 
be detected immediately, and the trustees would not be able to cheat on the 
decryption of a second ciphertext. 

The existence of such a possibility shows that we still are in a better situation 
than the one obtained with postal voting. Here, the trustees still have a possi- 
bility to demonstrate that they did not manipulate the election. That would be 
much harder for postal voting, where there is no practical way for the tallying 
officers to demonstrate that the tally they announce actually corresponds to the 
authentic ballots. 
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Abstract. Sequential aggregate signature schemes allow n signers, in 
order, to sign a message each, at a lower total cost than the cost of n in- 
dividual signatures. We present a sequential aggregate signature scheme 
based on trapdoor permutations ( e.g RSA). Unlike prior such propos- 
als, our scheme does not require a signer to retrieve the keys of other 
signers and verify the aggregate-so-far before adding its own signature. 
Indeed, we do not even require a signer to know the public keys of other 
signers! 

Moreover, for applications that require signers to verify the aggregate 
anyway, our schemes support lazy verification-, a signer can add its own 
signature to an unverified aggregate and forward it along immediately, 
postponing verification until load permits or the necessary public keys 
are obtained. This is especially important for applications where signers 
must access a large, secure, and current cache of public keys in order to 
verify messages. The price we pay is that our signature grows slightly 
with the number of signers. 

We report a technical analysis of our scheme (which is provably se- 
cure in the random oracle model), a detailed implementation-level spec- 
ification, and implementation results based on RSA and OpenSSL. To 
evaluate the performance of our scheme, we focus on the target applica- 
tion of BGPsec (formerly known as Secure BGP), a protocol designed for 
securing the global Internet routing system. There is a particular need 
for lazy verification with BGPsec, since it is run on routers that must 
process signatures extremely quickly, while being able to access tens of 
thousands of public keys. We compare our scheme to the algorithms 
currently proposed for use in BGPsec, and find that our signatures are 
considerably shorter nonaggregate RSA (with the same sign and verify 
times) and have an order of magnitude faster verification than nonaggre- 
gate ECDSA, although ECDSA has shorter signatures when the number 
of signers is small. 
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1 Introduction 

Aggregate signatures schemes allow n signers to produce a digital signature that 
authenticates n messages, one from each signer. This can be securely accom- 
plished by simply concatenating together n ordinary digital signatures, individ- 
ually produced by each signer. An aggregate signature is designed to maintain 
the security of this basic approach, while having length much shorter than n in- 
dividual signatures. To achieve this, many prior schemes e.g., jLM RS04INev()8| 
relied on a seemingly innocuous assumption; namely, that each signer needs to 
verify the aggregate signature so far, before adding its own signature on a new 
message. In this paper, we argue that this can make existing schemes unviable 
for many practical applications, (in particular, for BGPsec |Lepl2| / Secure BGP 
|K I ,S()()| ) and present a new scheme based on trapdoor permutations like RSA 
that avoids this assumption. In fact, our scheme remains secure even if a signer 
does not know the public keys of the other signers. 


1.1 Aggregate Signatures from Trapdoor Permutations 

Boneh, Gentry, Lynn, and Shacham jBGLS0:I| introduced the notion of aggre- 
gate signatures, in which individual signatures could be combined by any third 
party into a single constant-length aggregate. The jBGLSOdj scheme is based on 
the bilinear Diffie-Hellman assumption in the random oracle model |BkQS| . Sub- 
sequent schemes |LMRS04llNev()8j were designed for the more standard assump- 
tion of trapdoor permutations (e.g., as RSA |RSA78j h but in a more restricted 
framework where third-party aggregation is not possible. Instead, the signers 
work sequentially, each signer receives the aggregate-so-far from the previous 
signer and adds its own signature Q 

Lysyanskaya, Micali, Reyzin, and Shacham ll,MRS()4| constructed the first se- 
quential aggregate signature scheme from trapdoor permutations, with a proof 
in the random oracle mo del 0 However, their scheme has two drawbacks: the 
trapdoor permutation must be certified (when instantiating the trapdoor per- 
mutation with RSA, this means that each signer must either prove certain 
properties of the secret key or else use a long RSA verification exponent), and 
each signer needs to verify the aggregate-so-far before adding its own signature. 
Neven |INev()8| improved on |LlVlKS()4j by removing the need for certified trap- 
door permutations, but the need to verify before signing remained. Indeed, a 
signer who adds its own signature to an unverified aggregate in both |LMkS()4| 
and jJNlevOSj (or, indeed, in any scheme that follows the same design paradigm) 
is exposed to a devastating attack: an adversary can issue a single malformed 

1 The need for the random oracle model was removed by Lu, Ostrovsky, Sahai, 
Shacham, and Waters ILUS+Obl . who constructed sequential aggregate signatures 
from the bilinear Diffie-Hellman assumption; however, it is argued in |(lHKMl'n| 
that this improvement in security comes at a considerable efficiency cost. See 
also |H.S()DI( 1S( iOhj for other proposals based on less common assumptions. 

2 Bellare, Namprempre, and Neven IBNNlVzl showed how the schemes of IBQLS031 
and ILMKSO'fl can be improved through better proofs and slight modifications. 
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aggregate to the signer, and use the signature on that malformed message to 
generate a valid signature on a message that the signer never intended to sign 
(we describe the attack in the full version of the paper fBGR.11 hi L 

The nonsequential scheme of fBGI ;S03j does not, of course, require verifica- 
tion before signing. The only known sequential aggregate scheme to not require 
verification before signing is the history-free construction of Fischlin, Lehmann, 
and Schroder fKhSllj (concurrent with our work), but it, like jBGLSO.'if . requires 
bilinear DifHe-Hellman. 

Thus, the advantages of basing the schemes on trapdoor permutations (par- 
ticularly a more standard security assumption and fast verification using low- 
exponent RSA) are offset by the disadvantage of requiring verification before 
signing. We argue below that this disadvantage is serious. 


1.2 The Need for Lazy Verification 

In applications with a large number of possible signers, the need to verify before 
signing can introduce a significant bottleneck, because each signer must retrieve 
the public keys of the previous signers before it can even begin to run its signing 
algorithm. Worse yet, signers need to keep their large caches of public keys secure 
and current: if a public key is revoked and a new one is issued, the signer must 
first obtain the new key and verify its certificate before adding its own signature 
to the aggregate. 

A Key Application: BGPsec. Sequential aggregate signatures are particu- 
larly well-suited for the BGP sec |Lepl2| (formerly known as the Secure Border 
Gateway Protocol (S-BGP) jKLbMj ). a protocol being developed to improve 
the security of the global Internet routing system. (This application was men- 
tioned in several works, including [BGLS03 LOS+06 NcvOSj, and explored fur- 
ther in ESHaO BGPsec, autonomous systems (ASes) digitally sign routing 
announcements listing the ASes on the path to a particular destination. An an- 
nouncement for a path that is n hops long will contain n digital signatures, 
added in sequence by each AS on the path. (Notice that the length of the BG- 
Psec message even without the signatures increases at every hop, as each AS 
adds its name to the path, as well as extra information to the material in the 
routing message like its “subject key identifier” — a cryptographic fingerprint 
that is used to lookup its public key in the PKI |Lepl2| .) The BGPsec protocol 
is faced with two key performance challenges: 

1. Obtaining public keys. BGPsec naturally requires routers to have access to 
a large number of public keys; indeed, a routing announcement can contain 
information from any of the 41,000 ASes in the Internet |( ’( )X08j (this num- 
ber is according to the dataset retrieved in 2012). Certificates for public keys 
are regularly rolled over to maintain freshness, and must be retrieved from 
a distributed PKI infrastructure |Husl2| . Caching more than 41,000 public 
keys is expensive for a memory-constrained device like a router (which often 
does not have a hard drive or other secondary storage jKRflflj h Furthermore, 
whenever a router sees a BGPsec message containing a key that is not in 
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its cache, it incurs non-trivial delay on certificate retrieval (from a distant 
device that hosts the PKI) and verification. 

2. Dealing with routing table “dumps”. When a link from a router to its neigh- 
boring router fails, the router receives a dump of the full routing table, 
often containing more than 300, 000 routes ra, from it neighbors. Be- 
cause routers are CPU- and memory-constrained devices, dealing with these 
huge routing table dumps incurs long delays (up to a few minutes, even 
with plain, insecure BGP jBHMTflffi ih The delays are exacerbated if cryp- 
tographic signing and verifying is added to the process, and even more so 
when a router comes online for the first time (or after failure) and needs to 
also retrieve and authenticate public keys for all the ASes on the Internet. 

To deal with these issues, the BGPsec protocol gives a router the option to 
perform lazy verification: that is, to immediately sign the routing announcement 
with its own public key, and to delay verification until a later time, e.g., when 
(a) it has time to retrieve the public keys of the other signers, or (b) when the 
router itself is less overloaded and can devote resources to verification [rmsi . It 
is important to note that lazy verification by one router need not hurt others: if 
a router has not verified a given announcement, routers further in the chain can 
verify it for themselves. 

While there is legitimate concern that permitting lazy verification may cause 
routers to temporarily adopt unverified paths, the alternative may be worse: for- 
bidding lazy verification can lead to problems with global protocol convergence 
(agreement on routes in the global Internet), because of routers that delay their 
announcements significantly until they can verify signatures (e.g., during rout- 
ing table dumps, or while waiting to retrieve a missing certificate). Such delays 
create their own security issues, enabling easier denial of service attacks and 
traffic hijacking during the long latency window. Thus, even though BGPsec 
recommends that every router eventually verifies BGPsec messages, requiring 
that routers always verify before signing and re-announcing BGPsec messages is 
considered a nonstarter by the BGPsec working group |Sril 21 Section 8.2.1], Lazy 
verification is written into the BGPsec protocol specification as follows |Lepl2[ 
Section 7]: 

...it is important to note that when a BGPSEC speaker signs an outgoing 
update message, it is not attesting to a belief that all signatures prior to 
its are valid. 

Requirement: No Public Keys in the Signing Algorithm! Note that 
the primary obstacle here is not only verification time (which can perhaps be 
improved through batching and, anyway, can be considerably faster than signing 
time when using low-exponent RSA), but also the need to obtain public keys. 
Thus, lazy verification also requires that prior signers’ public keys are not used 
in the signing algorithm (e.g., hashed with the message as in jLMRS04IN ev()8 | ) . 
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Requirement: No Security Risk from Signing Unverified Aggregates! 

As we already mentioned, a signer who adds its own signature to an unverified 
aggregate in the schemes of jhMRS04] and |Nev()8j is exposed to a devastating 
attack. We already discussed how lazy verification may cause a signer to do so. 
Moreover, even without lazy verification, BGPsec may sometimes require a signer 
to add its own signature to an aggregate that is invalid. One such situation is 
when a router knowingly adopts a path that fails verification — for example, if it 
is the only path to a particular destination (the specification allows this |Lepl2| 
Section 5]). It will then add its own signature to the invalid one, because a 
“BGPSEC router should sign and forward a signed update to upstream peers if 
it selected the update as the best path, regardless of whether the update passed 
or failed validation (at this router)” |Sril2L Section 8.2.1]. The need to sign a 
possibly invalid aggregate also arises in the case each message is signed by two 
different signature schemes (as will happen during transition times from one 
signature algorithm to another), and “one set of signatures verifies correctly and 
the other set of signatures fails to verify.” In such a case the signer should still 
“add its signature to each of the [chains] using both the corresponding algorithm 
suite” |Lepl2[ Section 7]. Even if all BGPsec adopters avoid lazy verification 
and always verify before signing, these guidelines make it impossible to adopt an 
aggregate signature scheme that does not permit signing unverified aggregates, 
because of the possibility of attack. In other words, lazy verification is still needed 
for security even if no one uses it for efficiency! 

Our Goal. We note that lazy verification is permitted by the trivial solution 
of concatenating individual ordinary signatures, by aggregate signature schemes 
defined in |BGLS0d| . and by history- free aggregate signature schemes defined 
in jFLSlh . All of the above schemes do not require the current signer to know 
anything about the previous signers: neither their public keys nor the messages 
they signed. 0 Our goal is to obtain the same advantages, while relying on a more 
basic security assumption than the bilinear Diffie-Hellman of |BGLS()SIFI ;S1 1| 
and saving space as compared to the trivial solution. 


3 Identity-based aggregate signatures YCKlIl . EZEp, ICLW05I . IGLGW06I . 
II lerOfil . IGR06I . [bG0Y(V?1 . IHLY09I . fSVBIPl Dl . llVf H i also remove the need for 
obtaining public keys and have been proposed for use in BGPsec. However, agreeing 
on the secret-key-issuing authority for the global Internet seems politically infeasi- 
ble. Moreover, on a technical level, the proposals either require interaction among 
signers or are based on bilinear pairings. Interactive signatures would significantly 
complicate the protocol. And if we are willing to rely on bilinear pairings, IBGLS031 
already gives us an excellent choice that allows for lazy verification. 

Synchronized aggregate signatures (identity-based ones of IGlUKij and regular ones 
of |AGHl()| 'l also allow for lazy verification, but require a common nonce for all 
signers that, if repeated, breaks the security of the scheme. Implementing such a 
nonce in BGPsec presents its own challenges, because each signer has to ensure it 
never reuses a nonce, or else its secret key is at risk. The schemes are also pairing- 
based. 
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1.3 Overview of Our Contributions 

We present a sequential aggregate signature scheme that is secure even with lazy 
verification, based on any trapdoor permutation (such as RSA). Moreover, as in 
the nonsequential scheme of and the history-free scheme of ielsiu, 

our signers do not need to know anything about each other — not even each 
other’s public keys. To achieve this, we modify Neven’s scheme jJNevOSj by ran- 
domizing the //-hash function with a fresh random string per signer, which 
becomes a part of the signature, similarly to Coron’s PFDH |( lor()2| (Section 0) . 
Our modification allows each signer to sign without verifying, and without even 
needing to know the public keys of all the signers that came before him, avoiding, 
in particular, the attack on |LM K.S04INev()8] . 

Although the ultimate goal in aggregate signatures is to produce schemes 
whose signature length is independent of the number of signers, signatures in 
our scheme grow slightly with the number of signers. However (as also pointed 
out by jiNevOSp . while a constant-length aggregate signature is a theoretically 
interesting goal, what usually matters in practice is the combined length of sig- 
natures and messages, because that’s what verifiers receive: signatures rarely 
live on their own, separately from the messages they sign. And the combined 
length of messages, if they are distinct, grows linearly with the number of sign- 
ers, so the total growth of the amount of information received by the verifier is 
anyway linear. What matters, then, is how fast this linear growth is; below we 
derive parameters that show it to be much smaller than when ordinary trapdoor- 
permutation-based signatures are used as in the trivial solution. 

We make the following contributions: 

Generic Randomized Scheme. We present the basic version of our scheme, 
which requires each signer to append a truly random string to the aggregate 
(Section 0. Our scheme is as efficient for signing and verifying (per signer) 
as ordinary trapdoor-permutation based signatures, like the Full-Domain-Hash 
(FDH, |BR93i Section4]). We prove security (Section 01) in the random oracle 
model, based on the same assumption of trapdoor permutations (or claw-free 
permutations for a tighter security reduction) as in jJNevQSj . Our security proof 
is more involved, because the reduction cannot know the public keys of other 
(adversarial) signers during the signature queries. We should note that our proof 
technique also shows that Neven’s scheme need not hash other signer’s public 
keys in the signing algorithm (however, Neven’s scheme still fails under lazy 
verification). 

Shortening the Randomness. We show that the per-signer random string can 
be shorter if it is made input-dependent (Section 0, ensuring that a given signer 
never produces two different signatures on the same input. The idea of input- 
dependent randomness has been used before in signature schemes (e.g., jKWO.'fl 
Section 4]); however, our application requires a new combinatorial argument to 
show security. 

Instantiating with RSA. In the full version of the paper jBGRllhj we show 
how to instantiate our schemes with practical trapdoor permutations like RSA, 
which have slightly different domains for different signers. 
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Detailed Specification. We provide a full, parameterized step-by-step speci- 
fication of the truly-random and input-dependent-random versions of our signa- 
ture when instantiated with RSA (see the full version of the paper jBGRUbj . 
where we also provide guidelines on choosing parameters such as bit lengths). 
Implementation, Benchmarking and Practical Considerations. We im- 
plement our specification as a module in OpenSSL (Section 0 ; the implementa- 
tion is available from [BGRllalj . We compare our implementation’s performance 
to other potential solutions that allow for lazy verification; namely, jBGLSOfij . 
and the “trivial” solution of using n RSA or ECDSA signatures (the two algo- 
rithms currently proposed for use in implementations of BGPsec dHEI). When 
evaluating signatures schemes for use with BGPsec, we consider compute time 
as well as signature length. Thus, we show that our signature is shorter than 
trivial RSA when there are n > 1 signers and shorter than trivial ECDSA when 
there are n > 6 signers. (While our signature is longer than the constant-length 
jBGLS03] signature, it benefits from relying on the better-understood security 
assumption of RSA.) Moreover, our scheme enjoys the same extremely fast verify 
times as RSA. 

2 Preliminaries 

Sequential Aggregate Signature Security. The security definition for ag- 
gregate signatures (both original jBGLSOMj and sequential |LMRS()4| ~I is designed 
to capture the following intuition: each signer is individually secure against exis- 
tential forgery following an adaptive chosen-message attack |GMR88J regardless 
of what all the other signers do. In fact, we will allow the adversary to give 
the attacked signer arbitrary — perhaps meaningless — aggregate-so-far signatures 
during the signature queries, thus making them adaptive “chosen-message-and- 
aggregate” queries. We also allow the adversary, which we call “the forger,” to 
choose the public keys of all the other signers and to place the single signer who 
is under attack anywhere in the signature chain in the attempted forgery. This 
single attacked signer does not know any public keys other than its own and 
does not verify any aggregate-so-far given by the attacker. 

Our formal definition, presented in the full version jBGRllhj . is almost ver- 
batim from jLMRS04j . with one important difference needed to enable lazy ver- 
ification: the public keys and messages of previous signers are not input to the 
signing algorithm. Therefore, each signer, by signing a message, is attesting only 
to that message, not to the prior signers’ messages and public keys. At a tech- 
nical level, this change implies that in security game the forger, in its query 
to *th signer, is required to supply only the aggregate-so-far signature allegedly 
produced by the first i — 1 signers, but not the messages or public keys with 
respect to which this aggregate was allegedly produced. And, of course, to be 
considered successful, the forger must use a new message — in other words, it is 
not enough to change a public key or message of someone else in the chain be- 
fore the attacked signer (because such public keys and messages may not even be 
well defined during the attack). This definition is exactly the one that is satisfied 
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by the trivial solution of concatenating n individual signatures (and therefore 
suffices, in particular, for BGPsec). 

Fischlin, Lehmann, and Schroder jFLSll) propose a stronger security def- 
inition for their “history-free” signatures (building on history-free MACs of 
|EFG~10j i. which prevents certain reordering and recombining of signatures. 
Their definition thus has a security property that the trivial solution of concate- 
nating n individual signatures does not have. Although this security property 
is not needed in many applications (for example, in BGPsec reordering and 
recombining of signatures is prevented simply by the protocol message struc- 
ture, where each message must, for the purposes of functionality, include all 
the signed information contained in previous messages), our signature scheme 
in fact also prevents reordering and recombining that are of concern to jFLSllj : 
see fBGR.11 hi . 

Cryptographic Primitives. We will use pseudorandom functions |GGM8fij : 
the definition is omitted here because it is standard, but is presented in [BGK llbj 
for the sake of completeness. We will denote by £prf(< 7 , t) the maximum insecurity 
of PRF against any distinguisher who asks at most q queries and runs in time t. 

We assume the reader is familiar with the trapdoor and claw-free permuta- 
tions; we will denote by 7 r the easy direction of the trapdoor permutation, by 
7T _1 the hard direction, and by p the function such that it is hard to find a 
“claw” x, z with 7r(ir) = p(z). 

3 Our Basic Signature Scheme 

The intuition behind our construction is as follows. Like |lNev()8| . we use a 
random-oracle-based signature with message recovery, similar to PSS-R |BR9fij , 
as a basic building block. Signatures with message recovery embed a portion of 
the message into the signature, so it can be recovered on verification and does 
not need to be sent explicitly. In our case, the signature outputs two values: the 
output a; of a trapdoor permutation and an additional hash value h. The i th 
signer receives (a;*_i, from the previous signer and wants to sign a message 
rrij. To enable aggregation, we view (xi-i , rrii) together as a “message” to be 
signed with message recovery: we apply the signature with message recovery to 
this pair, so that Xi-i is embedded into the signature and does not have to be 
sent explicitly. The h portions of the signatures are exclusive-ored together for 
aggregation. 

So far, what we described is a slightly simplified version of the scheme from 
|Nev()8j . Note that verifying before signing is necessary in this scheme, because 
the transformation from (xi-i, /ij_i) to (a is deterministic, invertible, and 
can be performed by the adversary, except for the inversion of the trapdoor 
permutation performed at the last step. As we show in jBGRIlhj . no scheme 
constructed in this manner can permit lazy verification while protecting against a 
chosen message attack. Thus, to enable lazy verification, we require each signer to 
add a random string to the message, and concatenate and append these strings 
to the signature. Because the adversary lacks a priori knowledge about these 
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random strings, the chosen message attack becomes useless and we can prove 
that this is sufficient to enable lazy verification. 

Notation. We now describe the scheme precisely, using the following notation: 

— Let rrii be the message signed by signer i. 

— Let trapdoor permutation 7Tj be the public key of signer i and n~ 1 be the cor- 
responding secret key. We assume all permutations operate on bit strings of 
length ( v , i. e. , have domain and range {0, (In the full version |BGRllb| 
we remove the assumption that all permutations operate on the same do- 
main. Section 0 uses this to instantiate n from the RSA assumption, where tt,; 
is the easy direction, and 7T” 1 is the hard direction of the RSA permutation.) 

— Let H ( resp . G) be a cryptographic hash function (modeled as a random 

oracle) that outputs fjj-bit (resp. strings. 

— Let £ r be a parameter denoting the length of the randomness appended by 
each signer. 

— Let the notation cii denote a vector of values (oi, a 2 , ..., ai). 

— Let ® to denote bitwise exclusive-or. Exclusive-or is not the only operation 
that can be used; any efficiently computable group operation with efficient 
inverse can be used here. 

— e is a special character denoting the empty string; we assume e © x = x for 
any x. 

Sign: The i th Signer’s algorithm 

Require: 

(where Xi-i,hi-i = e, e if 
* = !)• 

1: Draw ri {0, 1 Y r 

2: rji <- H(TTi,rrii,ri,Xi-i) 

3: hi <- hi - 1 ® T]i 
4: 9i 4- G(hi) 

5: Vi = gi®Xi- 1 
6: Xi 4- 7 T^iVi) 

7: return n, Xi, hi {Note that 
Xi and hi go to the next 
signer; all the ri values go to 
the verifier, but only the last 
signer’s Xi and hi do.} 

The i th signer’s signing algorithm has no dependency on the number of signers; 
it takes in only the i th signers’ own public key and message and the aggregated 
portion of the signature Xi-\,hi-\. Moreover, the aggregated signature need not 
be verified before it is signed. For verification, only a single Xi and hi — namely, 
the one from the last signer — is needed. However, every n, from the first signer 
to the last, is needed. 


Ver H G : The Verification Algo- 
rithm 

Require : 7r„ , m n ,r n ,x n ,h n 
1: for i = n,n— 1 , ...., 2 do 

2: Vi 4- 7 Ti(Xi) 

3: 9i 4- G(hi) 

4: Zj_i 4- gi ® yi 

5: rji 4— H(-!Ti,mi,ri,Xi-i) 

6: hi-i <- hi ® rji 

7: if hi = H(Tn,ri,mi,e) and 
7ri(a:i) = G(h\) then 
8: return 1 

9: else 

10: return 0 
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4 Security Proof 

We prove our scheme secure if G and H are modeled as random oracles and n 
is a trapdoor permutation. The proof is easier to understand if 7 r is additionally 
claw-free (in particular, any homorphic permutation, such as RSA, is claw-free if 
it is trapdoor). We therefore present the proof for the claw-free case. The more 
general case is addressed in the full version |P>( 1 111 1 H . Our proof shows how a 
forger F on the aggregate signature scheme can be used to construct a reduction 
R that finds a claw in claw- free pair ( 7 r* , p*). R has F forge a signature for victim 
signer that uses permutation 7r* , and then uses the resulting forgery to find the 
claw in the claw-free pair. The structure of our reduction is similar to |JNev()8j : 
however, while |l\ev()8j constructs a “sequential forger” from forger F and then 
constructs reduction R from the sequential forger, our reduction must proceed 
in one step (since the notion of a sequential forger is undefined if hash queries 
do not include previous signers public keys). 

F’s Queries. We review what forger F expects to see on each one of its queries: 

- H-Query. F asks query Q = (n, to, r, x) (where x may be e) and expects to 
see H(Q) = 77 . 

- G-query. F asks query h. and expects to see g = G(h). 

— Sign Query. F asks query (m, h. x ) to be signed by 7r* , and expects to see 
r,h',x' back, where r looks uniform, h! = h 8 FUjv^.m, r, x), and n*{x') = 
G{h') © x. 

— Forgery. Finally, F outputs a forgery, cr = ir rl ,m„. r n , x n , h n where 7r„ = 
7r*. (Value n is chosen by F). 

Simplifying Assumptions about the Forger F. The following simplifies 
our proof: 

- We assume that the forger F forges the last signature in the signature chain; 
in other words, ir n = 7r* and m n is a new message never queried by F to the 
signing oracle (whose public key is 7r*). Indeed, any F can be easily modified 
to do so: if 7r* and a new message m n > are present in tt ti but at location 
n' < n, then we can run the verification algorithm loop for n — n' iterations 
to obtain x n > , h n > and output a' = Tr n > , m n > , r n > , x n i , h r ,/ as the new forgery, 
which will be valid if an only if a was valid. Note that we do not assume 
that 7r* (or any other public key) is present in the signature chain only once. 

— We assume that before forger F outputs its forgery and halts, it makes hash 
queries on all the hashes that will be computed during the verification of 
its forgery. Moreover, we assume that the forger does not output an invalid 
forgery; instead, it halts and outputs _L. Indeed, any F can be modified to 
do so; simply run the verification algorithm upon producing the forgery, and 
check that m„ is different from every message asked in a sign query. 

4.1 Description of the Reduction R 

Data Structures Used by R. HT and GT Tables. The reduction R uses 
‘programmable random oracles’, i.e., it chooses answers for random oracle 
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queries. R keeps track of queries whose answers have already been decided in 
two tables: HT for H and GT for G. We say HT(Q) = rj if HT stores r; as the 
answer to a query Q, and HT (Q) = _L if HT has no answer for Q (similar for 
GT). 

The HTree. The key challenge for the reduction is programming G, since G- 
queries are made on sums of H- query answers, rather than on individual H- query 
answers. Thus the reduction keeps an additional data structure, the HTree, that 
records responses to //-queries that may eventually be used as part of forger F’s 
forgery. (HTree is inspired by the graph Q in jlNevOSL Lemma 5.3].) 

The HTree is a tree of labeled nodes that stores a subset of the queries in 
HT. Each node in HTree (except the root) corresponds to an H- query that could 
potentially appear in the forger F’s final forgery a: the queries asked during 
verification of a will appear on a path from one of the leaf nodes to the root 
(unless a very unlikely event occurs). The HTree has a designated root node that 
stores the value ho = 0. We consider the root to be at depth 0. A node JVj at 
depth i > 0 stores: 

— a pointer to its parent node 

— a query Qi = ( 7 r», rrij, r<, £j-i) (where *<_i = e if and only if i = 1), 

— the ‘hash-response’ values 77 * and hi (hi is the XOR of the values r]x , . . . , rji 
on the path from the root to the node JVj; equivalently, A-i 8 rji, where hi- 1 
is stored in the parent node), 

— an auxiliary value j/j that is used to determine how future queries are added 
to the HTree, computed as G(hi) ® Xj_i (note that y t is the value to which 
the signer would apply 7T” 1 ), 

— if 7Tj = 7r*, an auxiliary value z that may be used to find a claw in ( 7 r*,p*). 

Every node at depth * = 2 or deeper satisfies the relation 7rj_i(*j_i) = y%- 1 , 
where 7Ti_i and yi-\ are stored at the node’s parent. New //-queries Q are added 
as nodes to the HTree if they can satisfy this relation; we say that such a query 
can be tethered to an existing node in the HTree. Intuitively, a query tethered 
to Ni becomes a child of N in the HTree: 

Definition 1 (Tethered queries). An H -query Q containing x ^ e is tethered 
to node Ni in the HTree if Ni stores tt*, yi such that 7 \i(x) = yi. If x = e, then Q 
is tethered to the root of the HTree. 

The HTree’s Lookup function determines the HTree node to which query Q can 
be tethered. We can argue that Lookup finds at most one node with high proba- 
bility.) The HTree is populated via the Sim-H algorithm. The reduction R adds 
an //-query Q to the HTree if and only if it is tethered to some node in the HTree 
at the time that forger F makes the H -query. It is possible that some query Q is 
not tethered at the time it is made, but becomes tethered at at later time (after 
some new nodes are added to the HTree). However, we show that this is highly 
unlikely. 
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Algorithms Used Used by R. The reduction R uses the following algorithms, 
which are formally specified in the full version issBiia- 
G-Queries. R answers these queries using a simple algorithm Sim-G. Sim-G 
returns GT (h) if it is already defined, or, if not, returns a fresh random value 
and records it in the GT. 

Sign-Queries. The reduction R answers queries (m, h. x ) to be signed by 7 r* 
using Sim-S. Since the reduction does not know the inverse of the challenge 
permutation 7 T” 1 , it ‘fakes’ a valid signature by carefully assigning certain entries 
in random oracle tables HT, GT, and ABORTS if these entries in HT, GT have 
been previously assigned. We are able to argue that Sim-S is unlikely to abort, 
since the entries added to HT, GT by Sim-S depend on a fresh random value r 
chosen as part of each signature query. 

H- Queries. The reduction R answers these queries Q = (7 T,m,r,x) using 
Sim-H. If there is an entry for Q in the HT, then Sim-H returns it. Otherwise, it 
assigns a fresh random value p as HT(Q). Next, Sim-H needs to prepare for the 
event that Q could lead to a forgery by the forger F, and thus needs to be stored 
in the HTree. To do this, Sim-H uses the Lookup function to check if Q can be 
tethered and thus should be added to the HTree. If Q can be tethered, Sim-H 
adds a new node to the HTree containing Q, its hash response rj, and an auxiliary 
value y that is used by the Lookup function to tether future //-queries. In order 
to ensure that HTree is a tree, it is important to ensure that y is a fresh random 
value; Sim-H aborts if that’s not the case. Finally, if Q contains the challenge 
permutation 7 r*, Sim-H adds a value 2 to the HTree node that FindClaw will 
use to derive a claw from a valid forgery output by the forger F. To prepare 
these values, Sim-H behaves almost as if it is ‘faking’ the answer to a sign-query, 
except that instead of using the usual challenge permutation 7 r* (as in Sim-S), it 
uses the challenge permutation p* applied to z (so as to benefit from forger F’s 
forgery, which would invert 7 r* on the output of p*(z), thus producing a claw). 
As in Sim-S, this involves carefully assigning certain entries in GT, and aborting 
if these entries are already assigned. We are able to show that Sim-H is unlikely 
to abort. 

Finding a Claw. Finally, forger F outputs a forgery ir n , wi n , r n . x n , h n , where 
Ti n = 77 *. Recall that our simplifying assumptions mean that the forgery is 
valid. The reduction R uses FindClaw to find a claw from the forgery. Because 
we assumed all the queries for verifying a have already been asked, the query 
(7 r*,m„, r n , £ n -i) is in HT. Moreover, if the forgery is valid, then with high prob- 
ability it is in the HTree as a child of the node storing ( 7 r n _i,m n _i,r n _i,a: Tl _ 2 ), 
which is in turn a child of the node storing (7r„_2, m n -2, r„_2> £71-3), etc. This 
holds because in a valid forgery, each //-query made during verification is teth- 
ered to the next one, and all tethered queries are in the HTree with high proba- 
bility. The value x„ (from the forgery a) and value z n (from HTree node of the 
query Q — (n*,m n ,r n ,x n -\)) constitute a claw. 


656 K. Brogle, S. Goldberg, and L. Reyzin 


4.2 Analysis of the Reduction 

Theorem 1. If a forger F succeeds with probability e, then the reduction R finds 
a claw for (tt *, p*) in about the same running time as F with probability 

£- {qs + QH)(qs +qG + qH)2~ iH - qs(qs + ?i?) 2~ lr - g# 2~ in (1) 

where qH is the number of H -hash queries, qa is the number of G-hash queries, 
and qs is the number of sign queries made by the forger F. 

We prove this theorem in full version of the paper jBGRllhj . The proof hinges 
on two key statements about the HTree. First, the probability that Lookup(a;) 
finds more than one HTree node is low (even though Lookup uses the functions n 
stored in the nodes of the HTree, which do not have to be permutations, because 
they are adversarially supplied and not certified like in |LMRS04| h Second, an 
.ff-query that was not added to HTree is unlikely to become tethered at some 
later time. Both statements rely on the fact that each time a query is placed on 
the HTree, its y value is random and independent of every other y value. 


5 Shorter Signatures via Input-Dependent Randomness 

To shorten our signature, we now show how to reduce l r (the length of the ran- 
domness appended by each signer). To do this, we replace the truly random r 
from our basic scheme with an r that is computed as a function of the inputs to 
the signer, and argue that it can be made shorter than the random r. Intuitively, 
we are able to maintain security with a shorter r because a given signer never 
produces two different signatures on the same input, thus limiting the informa- 
tion that an adversary can see and exploit. Of course, this input-dependent r 
need not be truly random; it suffices for a r to be a pseudorandom function of 
the input. 

5.1 Modifying the Scheme 

We now compute r as a pseudorandom function (PRF) over the input (mi, 
hi-i,Xi-i) received by that signer i. Let PRF seed : {0, 1}* — > {0, l} ir be a PRF 
with seed seed and insecurity epR F (g, t) against adversaries asking q queries and 
running in time t. Add a uniformly chosen seed to the secret key of the signer 
and replace linen] of the signing algorithm with r <— PRF seed (m, h. x). 

In the previous section, we found that l T must be long enough to tolerate 
a security loss of qs(qH + qs)^~ ir (Theorem QJ. As we show below, i r in the 
modified scheme can be shorter, since it needs only to allow for a security loss of 
approximately (qa +qH + qs + ^Hqs)^~ er ■ This is an improvement if we assume 
that qH ~ Qg (since both H and G are hash functions) and qs qn (since in 
practice hash queries can be made offline, while signing queries need access to 
an actual signer). 
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5.2 Key Insight for the Security Proof 

Using the reduction of Section 0 we had to choose r long enough to make 
it unlikely that when a forger makes a sign query on (77*, to*, Xi-i, hi-i), the 
algorithm Sim-S draws a random r,; that collides with a previously made H- 
query Qi = (77*, rrij, r*, Xi-i). Indeed, if Qi was answered by r/; and the forger 
chooses hi- 1 so that hi (which is computes as hi-i®rji) has already been queried 
to G, then when r collides, the reduction would be prevented from programming 
the random oracle G{hi). Making r depend on the forger’s input to the signer 
means that the forger gets only one chance (rather than qs chances) to make this 
happen for a given Qi, hi- 1, and hi, because subsequent attemps by the forger 
will use the same r. 

We show in the full version |KGR.11hj that the problem of proving this mod- 
ified scheme secure hinges on the following combinatorial problem. 
Combinatorial Problem. Suppose /3 values rj\, , rjg are chosen uniformly at 
random as ^jj-bit strings and given to an adversary, who then chooses a distinct 
values h[, . . . , h' a . The a x /3-matrix £ is constructed by XORing the r/ and the 
h ' values. A collision in £ is a set of entries that are all equal. What is the total 
number of entries in the 7 biggest collisions? 

Theorem 2 . With probability at least 1 — /3' 2 2 f - H , the total size of the 7 biggest 
collisions in £ is at most a + (£h + 2)7 2 . 

The proof of this theorem, as well as the entire security analysis of the modified 
scheme, are found in menu hi . 

6 Implementation and Evaluation 

In the full version of the paper jRGRI 1 hi we present details of instantiating 
our scheme with RSA (these include, in particular, dealing with the problem of 
slightly different domains for each signer’s permutation). We implemented the 
input-dependent-r version as a module in OpenSSL tOTTEj ■ The code is available 
from jUGRllaj . 

Overview of Our Implementation. We instantiate the permutation 77 with 
2048-bit RSA with public exponent 65537, hash H with SHA-256, full-domain 
hash G with the industry-standard Mask Generating Function (MGF) using 
SHA-256 IHSA02| . and the pseudorandom function PRF with HMAC-SHA-256 
|K( ;K06j . Instead of hashing the permutation 77 as-is inside the hash function 
H, we replace it with a short fingerprint of the RSA public key computed using 
SHA-256. Thus, we have parameters £ n = 2048, t), = 256, and £ r = 128; the £ r 
value is per signer, and each signer also adds one bit of information to deal with 
the problem that RSA gives each signer a slightly different domain. Therefore, 
the length of the aggregate signature for n signers is 2048 + 256 + 129n bits long 
(see Table |T]) . We justify this choice of parameters in jUGRllbj . 
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Table 1. Benchmark results for n signers. Computed on a laptop with a Core i3 
processor at 2.4GHz and 2GB RAM, running Ubuntu. The first three schemes were 
implemented using OpenSSL |0[ie| (with SHA-256 hashing and RSA public exponent of 
65537); the BGLS scheme was implemented using MIRACL |Sco1 1 1 (with the curve BN- 
128 |KN05| and with precomputation on the curve generator but not on the public keys; 
further precomputation on the public keys seems to improve verification performance 
by up to 20% at the cost of additional storage) . Results for specific values of n are not 
exactly in proportion due to rounding. 



2048-bit RSA 

Our scheme 

256-bit ECDSA 

256-bit BGLS 

Signature length (bits) 

2048n 

2304 + 129n 

512n 

257 

Length for n = 4.5 

9216 

2885 

2304 

257 

Length for n = 7 

14336 

3207 

3584 

257 

Sign time (ms) 

11.8 

11.9 

2.3 

1.9 

Verify time (ms) 

0.3n 

0.3n 

2.8 n 

« 18.9 + 6.6n 

Verify time for n = 4.5 

1.3 

1.3 

12.5 

47.6 

Verify time for n = 7 

2.1 

2.1 

19.4 

64.8 


Evaluation. We compare the implementation described in the previous para- 
graph to other signature schemes that allow for lazy verification. Table [I] contains 
data on our scheme as well as the “trivial” solution of using n RSA signatures, the 
solution of similarly using n ECDSA |Van92llEE02| signatures (which are current 
contenders for adoption in BGPsec |Sril21 Section 4.1]), and the aggregate scheme 
of |BGLS()3| (we do not compare against |ELS11| , because it is a more complicated 
version of |BGbS()3| . so |BGLS()3j performs better than jKLSIlj , anyway) . In ad- 
dition to providing formulas in terms of the number n of signers, we show results 
for specific values of n = 4.5 and n = 7. The value of 4.5 was chosen because 
it is roughly the average length of an AS path for a well-connected router on the 
Internet today (average length fluctuates with time and vantage point — see, e.g., 
here |Smi1 2| L We should note, however, that performance for higher than average 
values of n is particularly important: transition to BGPsec is expected to be par- 
ticularly problematic for weaker routers, which are more likely to be located to in 
the less well-connected portions of the Internet, and that experience longer than 
average paths. We therefore also show results for n = 7. 

The table shows that the |BGLS()3| scheme is a clear winner in terms of 
signature length and signing time, but has considerably slower verificatior0. 
It should be noted, however, that it is not being considered for the BGPsec 
standard at this stage jSri 1 21 Section 4.1]: schemes relying on bilinear Diffie- 
Hellman are not considered ready for worldwide deployment on the internet 
backbone by the BGPsec working group, because a consensus has not emerged 
on which curves provide the right tradeoff between security and efficiency (for 
example, there is not a NIST-approved set of curves such as the one contained 
in jiNISOOl Appendix D] for non-pairing-based elliptic-curve cryptography). It 

4 A more efficient pairing-based s cheme o f I' WMOSI with a constant total number of 
pairings was shown insecure by |SVS + 09) . 
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is also important to note that the time required to compute group operations 
and bilinear pairings depends very heavily on the curve used; improvements for 
various curves are produced frequently, and there is no generally accepted set of 
curves or algorithms at this point. We believe that, assuming continued progress 
to speed-up pairings on specific curves and sufficient confidence in the security 
of bilinear DifRe-Hellman on these curves, the scheme of (as improved 

by |HNN07j l should be considered for real applications. 

As far as the remaining three schemes are concerned, we observe that ECDSA 
provides the shortest signatures when n < 6, while our scheme dominates the 
three for n > 6 (as we already mentioned, performance for higher than average 
n is particularly important.) We also observe that our scheme has computa- 
tion time almost identical to simple RSA while having much shorter signatures 
(RSA signature length is listed as a particular concern in |Sril21 Section 4.1.2]). 
While ECSDA has the fastest signing time, the verification times for RSA and 
our scheme are an order of magnitude faster than those of ECDSA. Note that, 
for a router, the time required to sign does not depend on n, but the time re- 
quired to verify grows linearly with n, so verification times are also of particular 
importance to weaker routers at the edge of the network. 

Thus, if one is interested in a scheme based on the standard assumption of 
trapdoor permutations (albeit in the random oracle model), then our proposal 
fits the bill. Moreover, even if one is willing to accept security of ECDSA (which 
is not known to follow from any standard assumptions), our scheme may be 
preferable based on fast verify times and comparable-length signatures. Our 
scheme also has much faster verifying that pairing-based BGLS. 


Acknowledgements. We thank Anna Lysyanskaya for help with the early 
stages of this work, Hovav Shacham and Craig Costello for helpful pointers 
and explanations, and the DHS S&T CSD Secure Routing project for many 
useful discussions that informed the design of our schemes. We thank anonymous 
referees for helping us improve our presentation and for pointing out related 
work. This work was supported by NSF Grants 1017907, 0546614, 0831281, 
1012910, 1012798, and a gift from Cisco. 


References 


AGH10. Ahn, J.H., Green, M., Hohenberger, S.: Synchronized aggregate signatures: 

new definitions, constructions and applications. In: ACM Conference on 
Computer and Communications Security (2010) 

BCK96. Bellare, M., Canetti, R., Krawczyk, H.: Keying Hash Functions for Mes- 
sage Authentication. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, 
pp. 1-15. Springer, Heidelberg (1996) 

BGLS03. Boneh, D., Gentry, C., Lynn, B., Shacham, H.: Aggregate and Verifiably 
Encrypted Signatures from Bilinear Maps. In: Biham, E. (ed.) EURO- 
CRYPT 2003. LNCS, vol. 2656, pp. 416-432. Springer, Heidelberg (2003) 


660 K. Brogle, S. Goldberg, and L. Reyzin 


BGOY07. 

BGRlla. 

BGRllb. 

BHMT09. 

BJ10. 

BN05. 

BNN07. 

BR93. 

BR96. 

CHKM10. 

CID. 

CLGW06. 

CLW05. 


Cor02. 

COZ08. 

CSC09. 

DHS. 


Boldyreva, A., Gentry, C., O’Neill, A., Yum, D.H.: Ordered multisigna- 
tures and identity-based sequential aggregate signatures, with applications 
to secure routing. In: ACM Conference on Computer and Communications 
Security, pp. 276-285. ACM (2007) 

Brogle, K., Goldberg, S., Reyzin, L.: Implementation of sequential aggre- 
gate signatures with lazy verification (2011), 
http : //www. cs .bu. edu/f ac/goldbe/papers/bgpsec-sigs .html 
Brogle, K., Goldberg, S., Reyzin, L.: Sequential aggregate signatures with 
lazy verification, Full version and implementation code (2011), 
http : //www. cs .bu. edu/f ac/goldbe/papers/bgpsec-sigs .html 
Houidi, Z.B., Meulle, M., Teixeira, R.: Understanding slow bgp routing 
table transfers. In: Proc. ACM SIGCOMM Internet Measurement Confer- 
ence, pp. 350-355. ACM, New York (2009) 

Bagherzandi, A., Jarecki, S.: Identity-Based Aggregate and Multi- 
Signature Schemes Based on RSA. In: Nguyen, P.Q., Pointcheval, D. (eds.) 
PKC 2010. LNCS, vol. 6056, pp. 480-498. Springer, Heidelberg (2010) 
Barreto, P.S.L.M., Naehrig, M.: Pairing-Friendly Elliptic Curves of Prime 
Order. In: Preneel, B., Tavares, S. (eds.) SAC 2005. LNCS, vol. 3897, pp. 
319-331. Springer, Heidelberg (2006) 

Bellare, M., Namprempre, C., Neven, G.: Unrestricted Aggregate Signa- 
tures. In: Arge, L., Cachin, C., Jurdzinski, T., Tarlecki, A. (eds.) ICALP 
2007. LNCS, vol. 4596, pp. 411-422. Springer, Heidelberg (2007) 

Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for 
designing efficient protocols. In: ACM Conference on Computer and Com- 
munications Security, pp. 62-73 (1993) 

Bellare, M., Rogaway, P.: The Exact Security of Digital Signatures - How 
to Sign with RSA and Rabin. In: Maurer, U.M. (ed.) EUROCRYPT 1996. 
LNCS, vol. 1070, pp. 399-416. Springer, Heidelberg (1996) 

Chatterjee, S., Hankerson, D., Knapp, E., Menezes, A.: Comparing 
two pairing-based aggregate signature schemes. Des. Codes Cryptogra- 
phy 55(2-3), 141-167 (2010) 

The CIDR report, http : //www. cidr-report . org 

Cheng, X., Liu, J.,Gluo, L., Wang, X.: Identity-based multisignature and 
aggregate signature schemes from m-torsion groups. Journal of Electronics 
(China) 23(4) (July 2006) 

Cheng, X., Liu, J., Wang, X.: Identity-Based Aggregate and Verifiably 
Encrypted Signatures from Bilinear Pairing. In: Gervasi, O., Gavrilova, 
M.L., Kumar, V., Lagana, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. 
(eds.) ICCSA 2005. LNCS, vol. 3483, pp. 1046-1054. Springer, Heidelberg 
(2005) 

Coron, J.-S.: Optimal Security Proofs for PSS and Other Signature 
Schemes. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, 
pp. 272-287. Springer, Heidelberg (2002) 

Chi, Y.-J., Oliveira, R., Zhang, L.: Cyclops: The Internet AS-level obser- 
vatory. In: ACM SIGCOMM CCR (2008) 

Chakrabarti, S., Chandrasekhar, S., Singhal, M., Calvert, K.L.: An ef- 
ficient and scalable quasi-aggregate signature scheme based on lfsr se- 
quences. IEEE Trans. Parallel Distrib. Syst. 20(7), 1059-1072 (2009) 
Department of Homeland Security, Science and Technology Directorate, 
Cyber Security Division, Secure Protocols for Routing Infrastructure 
project. Personal Communication 


EFG+10. 

FLS11. 

GGM86. 

GMR88. 

GR06. 

Her06. 

HLY09. 


Husl2. 

IEE02. 

KLSOO. 

KR06. 

KW03. 


Lepl2. 

LMRS04. 

LOS+06. 


Nev08. 


NIS09. 


Sequential Aggregate Signatures with Lazy Verification 661 

Eikemeier, O., Fischlin, M., Gotzmann, J.-F., Lehmann, A., Schroder, D., 
Schroder, P., Wagner, D.: History- Free Aggregate Message Authentication 
Codes. In: Garay, J.A., De Prisco, R. (eds.) SON 2010. LNCS, vol. 6280, 
pp. 309-328. Springer, Heidelberg (2010) 

Fischlin, M., Lehmann, A., Schroder, D.: History- free sequential aggregate 
signatures. Technical Report 2011/231, Cryptology ePrint archive (2011), 
http : / / eprint . iacr . org 

Goldreich, O., Goldwasser, S., Micali, S.: How to construct random func- 
tions. J. ACM 33(4), 792-807 (1986) 

Goldwasser, S., Micali, S., Rivest, R.L.: A digital signature scheme secure 
against adaptive chosen-message attacks. SIAM J. Comput. 17(2), 281- 
308 (1988) 

Gentry, C., Ramzan, Z.: Identity-Based Aggregate Signatures. In: Yung, 
M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, 
pp. 257-273. Springer, Heidelberg (2006) 

Herranz, J.: Deterministic identity-based signatures for partial aggrega- 
tion. Comput. J. 49(3), 322-330 (2006) 

Hwang, J.Y., Lee, D.H., Yung, M.: Universal forgery of the identity-based 
sequential aggregate signature scheme. In: Li, W., Susilo, W., Tupakula, 
U.K., Safavi-Naini, R., Varadharajan, V. (eds.) ASIACCS, pp. 157-160. 
ACM (2009) 

Huston, G. (ed.): The Profile for Algorithms and Key Sizes for Use in the 
Resource Public Key Infrastructure (RPKI). IETF RFC 6485 (February 
2012) , http://tools.ietf. org/html/rf c6485 

IEEE Std 1363-2000. IEEE standard specifications for public-key cryptog- 
raphy (2002) 

Kent, S., Lynn, C., Seo, K.: Secure border gateway protocol (S-BGP). J. 
Selected Areas in Communications 18(4), 582-592 (2000) 

Karpilovsky, E., Rexford, J.: Using forgetful routing to control bgp table 
size. In: Proceedings of the 2006 ACM CoNEXT Conference, CoNEXT 
2006, pp. 2:1-2:12. ACM, New York (2006) 

Katz, J., Wang, N.: Efficiency improvements for signature schemes with 
tight security reductions. In: Jajodia, S., Atluri, V., Jaeger, T. (eds.) 
ACM Conference on Computer and Communications Security, pp. 155- 
164. ACM (2003) 

Lepinski, M. (ed.): BGPSEC Protocol Specification. IETF Network Work- 
ing Group, Internet-Draft (July 2012), 

http : //tools . ietf . org/html/draf t-ietf-sidr-bgpsec-protocol-04 
Lysyanskaya, A., Micali, S., Reyzin, L., Shacham, H.: Sequential Aggre- 
gate Signatures from Trapdoor Permutations. In: Cachin, C., Camenisch, 
J. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 74-90. Springer, Hei- 
delberg (2004) 

Lu, S., Ostrovsky, R., Sahai, A., Shacham, H., Waters, B.: Sequential 
Aggregate Signatures and Multisignatures Without Random Oracles. In: 
Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 465-485. 
Springer, Heidelberg (2006) 

Neven, G.: Efficient Sequential Aggregate Signed Data. In: Smart, N.P. 
(ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 52-69. Springer, Heidel- 
berg (2008) 

FIPS publication 186-3: Digital signature standard (DSS) (June 2009), 
http : // csrc . nist . gov/publicat ions/PubsFIPS . html 


662 K 

RS09. 

RSA78. 

RSA02. 

Scoll. 

Smil2. 

Sril2. 

SVS+09. 

SVSR10. 

Van92. 

WM08. 

XZF05. 

YCK04. 

ZSN05. 


. Brogle, S. Goldberg, and L. Reyzin 
OpenSSL toolkit, http://openssl.org/ 

Riickert, M., Schroder, D.: Aggregate and Verifiably Encrypted Signatures 
from Multilinear Maps without Random Oracles. In: Park, J.H., Chen, 
H.-H., Atiquzzaman, M., Lee, C., Kim, T.-h., Yeo, S.-S. (eds.) ISA 2009. 
LNCS, vol. 5576, pp. 750-759. Springer, Heidelberg (2009) 

Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital 
signatures and public- key cryptosystems. Commun. ACM 21(2), 120-126 
(1978) 

PKCS #1: RSA Encryption Standard. Version 2.1. RSA Laboratories 
(June 2002), 

ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-l/pkcs-lv2-l.pdf 
Michael Scott. MIRACL library (2011), http://www.shamus.ie/ 

Smith, P.: BGP routing table analysis (2012), 
http://thyme.rand.apnic.net/. See historical data — e.g., 

APNIC analysis summary for (September 7, 2012), 
http : //thyme . apnic.net/ap-data/2012/09/07/0400/mail-global 
Sriram, K. (ed.): BGPSEC Design Choices and Summary of Supporting 
Discussions. The Internet Engineering Task Force (IETF) Network Work- 
ing Group (July 2012), http://tools.ietf.org/html/ 
draft-sriram-bgpsec-design-choices-02 

Sharmila Deva Selvi, S., Sree Vivek, S., Shriram, J., Kalaivani, S., Pandu 
Rangan, C.: Security analysis of aggregate signature and batch verification 
signature schemes. Technical Report 2009/290, Cryptology ePrint archive 
(2009) , http : / / eprint . iacr . org 

Sharmila Deva Selvi, S., Sree Vivek, S., Shriram, J., Pandu Rangan, C.: 
Identity based partial aggregate signature scheme without pairing. Report 
2010/461, Cryptology ePrint archive (2010), http://eprint.iacr.org 
Vanstone, S.: Responses to NIST’s proposal. Communications of the 
ACM 35, 50-52 (1992) 

Wen, Y., Ma, J.: An aggregate signature scheme with constant pairing 
operations. In: CSSE (3). IEEE Computer Society (2008) 

Xu, J., Zhang, Z., Feng, D.: ID-Based Aggregate Signatures from Bilinear 
Pairings. In: Desmedt, Y.G., Wang, H., Mu, Y., Li, Y. (eds.) CANS 2005. 
LNCS, vol. 3810, pp. 110-119. Springer, Heidelberg (2005) 

Yoon, H., Cheon, J.H., Kim, Y.: Batch Verifications with ID-Based Sig- 
natures. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 
233-248. Springer, Heidelberg (2005) 

Zhao, M., Smith, S.W., Nicol, D.M.: Aggregated path authentication for 
efficient BGP security. In: ACM Conference on Computer and Communi- 
cations Security, pp. 128-138. ACM (2005) 


Commitments and Efficient Zero-Knowledge 
Proofs from Learning Parity with Noise* 


Abhishek Jain 1 , Stephan Krenn 2 , Krzysztof Pietrzak 2 , and Aris Tentes 3 

1 Massachusetts Institute of Technology, USA, and 
Boston University, USA 
abhishek@csail.mit . edu 
2 Institute of Science and Technology Austria 
{st ephan. krenn, pietrzak}@ist . ac . at 
3 Department of Computer Science, New York University, USA 
tentes@cs . nyu.edu 


Abstract. We construct a perfectly binding string commitment scheme 
whose security is based on the learning parity with noise (LPN) assump- 
tion, or equivalently, the hardness of decoding random linear codes. Our 
scheme not only allows for a simple and efficient zero-knowledge proof of 
knowledge for committed values (essentially a .E-protocol), but also for 
such proofs showing any kind of relation amongst committed values, i.e., 
proving that messages mo, . . . , m u , are such that mo = C(mi, . . . , m u ) 
for any circuit C. 

To get soundness which is exponentially small in a security parameter 
t, and when the zero-knowledge property relies on the LPN problem with 
secrets of length l, our 3 round protocol has communication complexity 
0{t\C\(.\og(i)) and computational complexity of 0{t\C\l) bit operations. 

The hidden constants are small, and the computation consists mostly of 
computing inner products of bit- vectors. 

1 Introduction 

Commitment schemes and zero-knowledge proofs are fundamental cryptographic 
primitives. In this work we propose a simple string commitment scheme and show 
efficient zero-knowledge proofs for any relation amongst committed values. The 
security (more precisely, the computational hiding property) of our commitment 
scheme relies on the learning parity with noise (LPN) assumption, or equivalently, 
on the hardness of decoding random linear codes. 

Commitment schemes. A commitment scheme allows a party to commit to a 
message m by publishing a commitment er, and this commitment can be opened 
at a later point in time. The security properties required are called the hiding 
and binding property. Hiding means that one cannot learn anything about the 

* This work was in part supported by the European Research Council under the 
European Unions Seventh Framework Programme (FP7/2007-2013) / ERC Starting 
Grant (259668-PSPC). 

X. Wang and K. Sako (Eds.): ASIACRYPT 2012, LNCS 7658, pp. 663-gSU] 2012. 

(c) International Association for Cryptologic Research 2012 


664 A. Jain et al. 


committed message m from the commitment a, binding means that one cannot 
open a commitment a to two different messages m ^ ml . 

In our scheme, the commitment to a message m is simply the encoding of m 
using a random linear code, with some noise added to the codeword. Exploiting 
the linear structure of this scheme, we get simple and efficient zero-knowledge 
proofs for linear and multiplicative relations of committed values. 
Zero-knowledge proofs of knowledge. Zero-knowledge proofs of knowledge are 
two party protocols, which allow a prover to convince a verifier that it knows 
some secret piece of information, without the verifier being able to learn anything 
about the secret value except for what is revealed by the claim itself. 

The LPN assumption. The computationally hard problem underlying the secu- 
rity (i.e., the computational hiding property) of our commitment scheme is the 
learning parity with noise (LPN) assumption. This problem asks to distinguish 
“noisy” linear equations A .s © e from uniformly random. Here A is a “skinny” 
public random binary k x l matrix, s is a uniformly random i bit secret and e is 
a random vector of low weight (the exact distribution of e is discussed in H2.2I) . 
The LPN problem has found numerous applications as the assumption underlying 
provably secure cryptosystems, like secret-key and public-key m 

authentication schemes or symmetric encryption jlllhj . 

LPN based cryptosystems are interesting for theoretical and practical rea- 
sons. On the one hand, the LPN problem is equivalent to the problem of de- 
coding random linear codes, a problem that has been studied for over half a 
century j. r )H(il7l28I.TF)l . The best known algorithms need 2 0 ^/ log ^ time and sam- 
ples (the number of samples is given by the number k of rows of A) 0. If 
k = 0(£) is linear in i, as it will be the case in this paper, the best algorithms 
need exponential time. Furthermore, unlike most number-theoretic prob- 
lems used in cryptography, the LPN problem is not known to become insecure 
against quantum algorithms. On the practical side, LPN based cryptosystems 
tend to be extremely simple and efficient, and thus are good candidates for 
weak devices like RFID tags, where existing cryptographic algorithms cannot be 
implemented due to constraints on code-size, running-time or memory. 

1.1 Our Contributions 

Commitments from LPN. In our scheme the commitment to a message m £ T v 
(where X = f {0, 1}) is simply 

Com(m) = A.(r||m) © e, 

where A = A'|| A" £ Z kx( - i+v ' 1 is a public random binary matrix, r £ T e is a 
uniformly random vector and e £ I k is a random low-weight vector. To open a 
commitment a, one reveals r,m,e and checks if a = A.(r||m) © e and e is low 
weight. Here the length i= |r| is chosen such that the LPN problem with secrets 
of length l is hard. The length v = \m\ of the message can be arbitrary, but for 
efficiency reasons it is best to choose it roughly of the same size as l. 
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Setting k = 0(v + 1) large enough, the commitment scheme becomes com- 
putationally hiding and perfectly binding (with overwhelming probability over 
the choice of A). The binding property follows by the large distance of the code 
generated by the random matrix A, the hiding property follows directly from 
the LPN assumption which implies that A'.r ® e is pseudorandom. 

Zero-knowledge protocols for arbitrary circuits. We construct a zero-knowledge 
proof of knowledge, which is basically a so called 17-protocol, that allows to prove 
knowledge of the message m hidden inside a commitment without revealing 
anything about it. Furthermore, we give a protocol for proving that committed 
messages m 0 ,mi,m 2 satisfy a linear relation mo = Xi.mi ® X 2 .rri 2 (for any 
square matrices Xi,X 2 ). Based on this protocol, we construct proofs for any 
bitwise relations m 0 = mi o m 2 , where o can be any bitwise relation like AND, 
NAND, OR, NOR. As NAND is functionally complete, we can prove relations 
ra 0 = C(m ,\ . . . . , mt ) for any boolean circuit C. 

For A £ X kxrn . the communication complexity of our proofs is 0(k log k). 
Setting v = l, we can set k = f)(v + £) = 0(v), thus the proofs are quasilinear 
in the length of the committed messages. The soundness error of our protocol is 
2/3. To get soundness errors of 2“ 16 and 2 -32 as specified by the ISO/IEC-9798-5 
standard we would need 28 and 55 repetitions, respectively. 

As one application (which we bring up to compare our scheme to existing 
schemes in the related work section below) consider an MV language L = {x : 
3w : 7Z(x, w) = 1}. Our scheme can be used to prove knowledge of a witness 
w for x £ L as follows: commit to mo = w and mi = 1 and prove that the 
committed values satisfy the relation C x (mo ) = mi where C x (.) is the circuit 
computing the MV relation 1Z(x , .). This proofs avoid expensive Karp reductions 
(to 3-coloring or Hamiltonian cycles) used in classical proofs. 


1.2 Related Work 

Our basic scheme for proving knowledge of a committed value is similar to 
Stern’s jHE| zero-knowledge proof of knowledge for the syndrome decoding prob- 
lem, which can be seen as the “dual” of the LPN problem, and both are known 
to be MV - complete 0. Subsequent to Stern’s work, Veron j3Zj proposed a S- 
protocol for proving knowledge of an LPN secret, but as we show in the full 
version of this paper 123 ], there is a gap in the proof of the ZK property of his 
protocol. Recently, several works have extended Stern’s protocol to construct 
efficient identification schemes from various lattice-based and coding based as- 
sumptions (see |fill Ii27| and references therein). In particular, Cayrel et al. |l()j 
constructed an identification scheme with knowledge error 1/2 based on the <?-ary 
syndrome decoding problem. However, this improvement in the knowledge error 
adds two additional rounds to the protocol, and thus their construction does 
not decrease the total number of rounds required to reach a specified knowl- 
edge error. Very recently, Asharov et al. J3j constructed 17-protocols for various 
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learning with errors (LWE) related languages. We note that the ZK property of 
their protocols crucially relies on the ability to use large noise to “smudge out” 
small differences in distributions. Unfortunately, this technique does not extend 
to the setting of LPN (which is the focus of this work). We finally note that 
all the aforementioned works only construct ZK protocols for specific languages 
and, unlike our work, do not consider the general problem of constructing ZK 
proofs for circuit satisfiability. 

There is a large body of work on efficient interactive or non-interactive zero- 
knowledge proofs and arguments, see, e.g., 181 1 .11 I II 1 6121 )I2 212 did Old 1 1 and the 
references therein. For ZK arguments (as opposed to proofs), where the sound- 
ness property is only required to hold against computationally bounded malicious 
provers, one can construct schemes which asymptotically only require poly loga- 
rithmic communication (e.g., the interactive argument based on CRHFs m or 
the non-interactive argument in the random-oracle model 133 ). These schemes 
rely on probabilistically checkable proofs (PCP), and are not really practical. 

The beautiful work of Ishai et al. m on zero-knowledge proofs from secure 
multiparty computation aims at a similar goal as this work. They show how to 
construct ZK proofs from MPC; When instantiated with simple MPC protocols 
like GMW |1 bj they get ZK proofs for showing knowledge of a witness w such that 
1Z(x,w ) = 1 with communication complexity 0(ts), where 2 _t is the soundness 
error and s is the size of the circuit computing the relation 1Z(x , .), which is 
the same asymptotic behavior we get (as explained in the previous section). 
Using protocols relying on sophisticated secret sharing schemes for constant- 
size fields based on algebraic-geometric codes m they even get an asymptotic 
communication complexity of 0(s ) + poly it, log s), but due to the large hidden 
constants in such codes this scheme will only be more efficient than the simpler 
scheme for very large circuits. 

A ZK proof for any MV relation can of course be used to prove any relation 
amongst committed values, but in general this would be rather expensive as the 
computation of the opening of the commitment must be part of the description 
of the relation. In contrast, our ZK proofs work directly on committed values, 
and we do not pay extra for this. Proving relations amongst committed values 
has been considered before, see na and references therein. These works give 
very efficient proofs for algebraic circuits over large fields, but are less suited 
for circuits over very small ones, in particular, for Z 2 as in boolean circuits. As 
an application, consider the case where we need to prove that committed values 
satisfy mo = AES(mi,m 2 ), i.e., mo is the output of the AES block-cipher 
under key mi on input m 2 - 

1.3 Outline 

We introduce some notation and recapitulate the basic definitions required for 
this paper in Section 0. In Section y we present a very simple commitment 
scheme based on the hardness of the LPN problem. Protocols allowing one to 
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prove knowledge of the content of such commitments, and relations among them, 
are presented in Section 0. We finally conclude in Section 0. 

2 Preliminaries 

We use bold lower-case and upper-case letters like a, A to denote vectors and 
matrices, respectively. Probabilistic polynomial time (PPT) algorithms are writ- 
ten by sans-serif letters like A. Calligraphic letters like A always denote sets. We 
write a <— A if a was drawn uniformly at random from set A, a <— % if a was 
drawn according to some probability distribution and a<— A if a is the output 
of a randomized algorithm A. 

We denote the set (0, 1} by I, thus X k denotes the set of strings of length 
k. The Hamming weight of a £ I 1 is denoted by ||a||i = With 

X k = (a £ I k : ||a||i = w} we denote the set of all fc-bit vectors of weight 
exactly w. The all-zeros and all-ones vectors of length k are denoted by 0 fe and 
l fc , respectively. The concatenation of vectors a and b is written as a||6. The 
symmetric group on k elements (i.e., the set of all permutations on k elements) 
is denoted <Sfc. For n & <S/ C and a € I k , n(b) denotes the string a[i] = b[7r(i)]. 

2.1 Commitment Schemes 

Definition 2.1. A triple of algorithms (KGen,Com,Ver) is called a commitment 
scheme if it satisfies the following: 

— On input l e , the key generation algorithm KGen outputs a public commitment 
key pk. 

— The commitment algorithm Com takes as inputs a message m from a message 
space M and a commitment key pk, and outputs a commitment/opening pair 
( c,d ). 

— The verification algorithm Ver takes a key pk, a message m, a commitment 
c and an opening d and outputs 1 or 0. 

The commitment scheme we construct satisfies the following security properties: 

— Correctness : Ver evaluates to 1 whenever the inputs were computed by an 
honest party, i.e., 

Pr[Ver(p/c, m, c, d) = T,pk A KGen(l^), m e M, ( c,d ) •£- Com (m,pk)\ = 1 

— Perfect binding: With overwhelming probability over the choice of the public 
key pk <— KGen(l^), no commitment c can be opened in two different ways, 
i.e., 

(Ver (pk, to, c, d) = 1) A (Ver (pk, to', c, d!) = 1) => to = m' 

— Computational hiding: A commitment c computationally hides the commit- 
ted message: with overwhelming probability over the choice of pk<— KGen(l^), 
for every to, to' £ M. and (c, d)<— Com (m,pk), {d , d') <— Com(m',pk) the dis- 
tributions c and d are computationally indistinguishable. 
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2.2 Learning Parity with Noise 

The computational assumption underlying all our constructions is the learning 
parity with noise (LPN) assumption. Below we define the decisional version of 
LPN in a general form, not yet specifying the error distribution. 

Definition 2.2. For k,£ £ N, let x be an error distribution overI k . The deci- 
sional (x, £, fc)-LPN problem is ( s,e)-hard if for every distinguisher D of size s: 

Pr [D(A, A.x ® e) = 1] — Pr [D(A, r) = 1] < e 

| x, A,e r, A 

where A <— T kxe , e <— Xk, t <— T k , and x <— X s - is fixed and secret. The search 
version is defined similarly, but we require that no D can find the secret x: 

| Pr [D(A, A.x ® e) = x| < e 

In the standard definition of the LPN problem, the error distribution x is the 
Bernoulli distribution with some parameter 0 < r < i.e., every bit e[i\ is 

chosen independently and identically distributed with Pr[e[i] = 1] = t, we will 
refer to this version as LPN r . As mentioned in the introduction, for k = 0(f ) 
as used in this paper, the search version of LPN T is the same as the problem 
of decoding random linear codes, and is believed to be exponentially hard. The 
search and decision version of LPN T are known to be equivalent . but to 
show this search to decision reduction, the number of samples k in the decision 
version must be much larger than in the search version (by a factor of fi(l/e)). 
More recently, a sample preserving reduction has been shown |2l Lemma 4.4], 
(cf. m for a more general treatment of sample preserving reductions). 

Exact LPN. In this work we define a new version of the LPN problem, which we 
call exact LPN or xLPN for short. Similar to LPN T , xLPN is parameterized by 
some noise parameter 0 < r < and the (search or decision) xLPN r problem is 
defined exactly like LPN T , except that the Hamming weight of the error vector 
is exactly [kr] (not of expected weight fcr as in LPN r ). That is, e is sampled 
uniformly at random from the set ■ 

In this work, we assume the hardness of decisional xLPN Q It is not hard to 
see that search xLPN r is hard iff search LPN T is hardQ Showing equivalence 
of decisional xLPN r and LPN T version is more tricky. The classical search to 
decision reduction for LPN r from does not work for xLPN r , but the sample 
preserving reduction [21 Lemma 4.4] does. Summing up, we have 

1 The security of the basic commitment scheme can be based on decisional LPN, but 
our T-protocols to prove relations amongst committed values “leak” the weight of 
the error vectors. Thus, to be zero-knowledge, we need this value to be fixed. 

2 Any D who outputs x with advantage e for xLPN T , will output the secret x with 
advantage at least e/Vk of LPN T , as the error vector sampled in LPN T has weight 
[fcr] with probability > 1 /Vk, and conditioned on this being the case, the error 
distribution is exactly the same as in xLPN T . 
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Proposition 2.3. The hardness of decisional xLPN r ( used in this paper) is poly- 
nomial ly related to the hardness of search LPN r . 

The sample preserving reduction j2l Lemma 4.4] relies on the Goldreich-Levin 
theorem, and as a consequence is not very tight. Although we do not know of 
significantly more efficient attacks against xLPN r than against LPN r , if one in- 
sists on basing the security of our schemes on the standard LPN r assumption in 
a provable manner, one must take the loss in the reduction into account, which 
would result in rather large parameters. A protocol relying on the security of 
the standard decisional LPN r assumption can be found in the full version of this 
paper. The protocol given there can be extended to prove arbitrary relations 
amongst committed values, in the same manner as in the case of xLPN r assump- 
tion. However, this protocol is somewhat more complicated and has a worse 
soundness error (4/5 as compared to 2/3), and thus requires roughly twice the 
number of repetitions in order to achieve the same knowledge error. 

As suggested in (2S1 Section 5], replacing the LPN assumption with an as- 
sumption where we have a fixed upped bound on the weight of the error vector 
(like it is the case in xLPN) would remove the completeness error (and thus 
allows for more efficient instantiations) also for other LPN based schemes, like 
HB type protocols. We thus think that investigating the exact hardness of the 
xLPN-problem is of interest beyond the realm of this work. 


2.3 Zero-Knowledge Proofs of Knowledge and ^-Protocols 

Informally, a zero-knowledge proof of knowledge is a two party protocol between 
a prover P and a verifier V which allows the former to convince the latter that 
it knows some secret piece of information without revealing anything about it. 
A bit more precisely, in a zero-knowledge proof for a binary relation 7Z, the 
parties have common input y and the prover has private input w such that 
(■ y,w ) e 7 Z. The protocol must then satisfy the following three properties: (i) 
For an honest prover, the verifier always accepts ( completeness ). ( ii ) For every 
potentially malicious verifier V* there exists a PPT simulator only taking y as an 
input whose output is indistinguishable from conversations of V* with an honest 
prover ( zero-knowledge ). (in) From every prover P* which can make the verifier 
accept with a probability larger than a threshold n (the knowledge error), a w' 
satisfying ( y , w') G 1Z can be extracted efficiently in a rewindable black-box way 
(proof of knowledge). For a formal definition we refer to Bellare and Goldreich 0. 

The protocols we are going to design in the following are all instantiations of 
the following definition: 

Definition 2.4 (A’-Protocol). Let (P,V) be a two-party protocol, where V is 
PPT, and let 1Z be a binary relation. Then (P,V) is called a A’-protocol for 1Z 
with challenge set C, public input y and private input w, if and only if it satisfies 
the following conditions: 

— 3-move form: The protocol is of the following form: 

• The prover P computes a commitment t and sends it to V . 
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• The verifier V draws a challenge c<—C and sends it to P . 

• The prover sends a response s to the verifier. 

• Depending on the protocol transcript ( t,c,s ), the verifier accepts or re- 
jects the proof. 

The protocol transcript ( t , c, s) is called accepting, if the verifier accepts the 
protocol run. 

— Completeness: The verifier V accepts whenever ( y , w) £ 1Z. 

— Special soundness: There exists a PPT algorithm E (the knowledge ex- 
tractor ) which takes a set {(i, c, s c ) : c £ Cj of accepting transcripts with the 
same commitment as inputs, and outputs w' such that ( y , w r ) £ 1Z. 

— Special honest-verifier zero-knowledge: There exists a PPT algorithm 
S (the simulator ) taking y and c £ C as inputs, and which outputs triples 
( t , c, s) whose distribution is (computationally) indistinguishable from accept- 
ing protocol transcripts generated by real protocol runs. 

It is well known that every 17-protocol is also a proof of knowledge for the 
same relation [El- However, while in ^-protocols the existence of a simulator is 
only required for the honest verifier, zero- knowledge proofs require this existence 
for arbitrary, potentially malicious, verifiers. This can be reached by applying 
generic standard techniques to 17-protocols, e.g., Damgard et al. fT7| . 

We note that our definition of 77-protocols slightly differs from the standard 
definition found in the literature |12llbj . For the special soundness property, it 
is typically required that a valid witness can already be computed given any two 
accepting conversations with the same commitment but different challenges. We 
loosen this definition and only require that w' can be computed given valid re- 
sponses to all challenges for a fixed commitment t. It can easily be seen that the 
aforementioned results showing that every 77-protocol is also a proof of knowl- 
edge still hold true. However, while for the standard definition t he k nowledge 
error is given by 1/#C it is only given by 1 — 1 / fiC for Definition £3 

3 Perfectly Binding String Commitments from LPN 

Our commitment scheme is parameterized by the main security parameter i £ N, 
0 < r < 0.25, the message length v £ N and k £ 0(1 + v). Finally, we set 
w = \ rk~\ . The algorithms of the commitment scheme are then given as follows: 

— KGen: The public commitment key consists of the matrix A = A' HA" £ 

jkxte+v)^ where A / and A " fi_jkxv 

— Com: The commitment to a message m £ I v is given by A.(r||m)® e, where 
r ■£- and e £- l( l: . The opening of the commitment is given by m and r. 

— Ver: Given a commitment c, a message ml and a randomness r’ . a verifier 
accepts if and only if e' = c CD A.(r'\\m!) has weight w. 

Theorem 3.1. Let 0 < r < 0.25, and £,k,v £ N be such that the decisional 
xLPN r problem (with secrets of length £ and k samples) is hard. Letk = 0(K+v) 
be such that with overwhelming probability a randomly chosen generator matrix 
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of a linear code A e I kx ( e+V ) has distance larger than 2w, i.e., ]]A.a;||i > 2 w 
for all x e X l+V . Then the above commitment scheme is perfectly binding and 
computationally hiding. 

Proof. The required security properties can be seen as follows: 

Perfect binding. Assume, by contraposition, that m, : , r t . i = 1,2 are two different 
openings for a commitment c. That is, we have that e, = c 0 A.(r,;||m,;) has 
norm at most w for i = 1, 2. Thus we have that e\ ® e-i = A.(ri||mi ® r^mf) 
is a codeword of length ||ei ® e 2 1| i < ||ei||i + ||e 2 1| i < 2 w, contradicting our 
assumption on the distance of the code generated by A. 

Computational hiding. We have that c = A'.r ® e ® A " .m. By the xl_PIM r - 
assumption A'.r ® e, and thus also c, is pseudorandom. □ 

4 Zero-Knowledge Proofs of Knowledge 

In this section we first construct a 27-protocol, which on common input A and 
y allows the prover to prove knowledge of a valid opening of y under the com- 
mitment scheme presented in Section y. The protocol borrows some basic ideas 
from Stern m, who gave a 27-protocol for the syndrome decoding problem. 

After presenting this basic protocol, we give two further 27-protocols. The 
first can be used to prove that committed strings satisfy any linear relation. The 
second protocol can be used to show that committed strings satisfy any bitwise 
relation like bitwise AND, NAND, OR or NOR. As NAND is functionally com- 
plete, using this protocol we can construct 27-protocols for any relation amongst 
committed messages. 


4.1 Proving Knowledge of a Valid Opening 

The following 27-protocol proves knowledge of a valid opening for commitments 
of the form described in the previous section, i.e., it shows possession of r, m, e 
such that y = A.(r||m) ® e for an error satisfying ||e||i = w. For notational 
convenience we will sometimes write s to denote the vector r\\m. 

A first idea for such a protocol (which will not quite work) is to mimic 
Schnorr’s protocol as follows: (1) the prover P commits to some value to = 
A.v ® f, (2) the verifier V sends a challenge c<— (0, 1}, (3) the prover opens to 
(i.e., sends v, f) if c = 0 and opens to (By (i.e., sends v ® s,f ® e) if c = 1. 

If in this protocol / is sampled such that it has low weight, then e ® / leaks 
information about e, and the protocol is not zero-knowledge. On the other hand, 
if / is uniformly random (so e® / is independent of e), the protocol is not sound 
(informally, all we can say is that from answers to both challenges we can extract 
s', e! where y — A. s' ® e', but e' can have arbitrary weight, and finding such 
a solution is trivial). In our protocol f is chosen uniformly at random, and to 
ensure soundness we use a trick from Stern ES!- We additionally commit to a 
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random permutation n £ Sk and to n(f), 7r(/©e). On challenge c = 0 and c = 1 
we now additionally make sure the openings are consistent with the committed 
errors by opening n and either n(f) (if c = 0) or tt ( f®e) (if c = 1). Moreover we 
extend the challenge space from two to three. The extra challenge c = 2 is used 
to verify that the weight of 7r(/)©7r(/©e) = 7r(e) (and thus e) is small, this will 
ensure soundness, as from valid answers to all three challenges we can extract 
s' . e! where y = A .s' ® e' and e' has low weight. Opening the commitments 
to 7r(/),7r(/ ® e) on c = 2 does not hurt the ZK property, as 7r(/),7r(/ ® e) 
contains no information about e except its weight. 

The common input to P, V is A and y, P’s secret input is (e, s). The protocol 
flow is then given as follows, where the commitment scheme Com(.) can be instan- 
tiated by an arbitrary perfectly binding string commitment scheme, potentially 
the scheme presented in Section 0 itself. 

— P samples a permutation ir <— Sk at random. 

It then draws v <—X e+v , f<—X k , and then sends the following commitments 
to the verifier V: 

Co t— Com(7r' = 7r, to = A.v © /) 

Ci <- Com(ii = 7 r(/)) 

C 2 <- Com (t 2 = n(f © e)) 

— The verifier draws c<— Z3 and sends it to P. 

— Depending on the value of c, P opens the following commitments: 

0. P opens Co, Ci by sending 7r', to, ti and the associated random coins. 

1. P opens Co, C 2 by sending n', to, t 2 and the associated random coins. 

2. P opens C\,C 2 by sending t\,t 2 and the associated random coins. 

— The verifier verifies the correctness of the openings received from the prover, 
and additionally performs the following checks depending on the challenge 
c: 

0. V accepts, iff to © 7r ,_1 (ti) £ img A and n 1 £ Sk- 

1. V accepts, iff to © 7r ,_1 (t 2 ) CD y £ img A. 

2. V accepts, iff ||ti © t 2 1| 1 =w. 

Theorem 4.1. The above protocol is a E-protocol for the following relation: 

1lLPN = {{{A-,y),(r,m,e)):y = A.(r\\m)®e A ||e||i=w} 

Proof. The 3-move form required for Definition l2.il is clear. The remaining prop- 
erties can be seen as follows. 

Completeness. It is easy to see that an honest prover can always convince the 
verifier. Depending on the challenge c, we get: 

0. to©7r ,_1 (ti) = (A.u©/)©7r -1 (7r(/)) = A.u £ img A and 7r is a permutation. 

1. t 0 ©7r , - 1 (t2)©2/ = (A.u©/)©7r- 1 (7r(/©e))©(A.s©e) = A.(v®s) £ img A. 

2. ||ti © t 2 ||i = ||tt(/) © 7t(/ © e) || 1 = ||7r(/©/©e)||i = ||7r(e)||i = ||e||i =w. 
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Special Soundness. Assume that we have fixed values Co , C\ , C2 and openings 
for all challenges c £ Z3, such that the verifier accepts on all of them. Then, 
by the assumed perfect binding property of the underlying commitment scheme 
Com(.), we know that the openings to identical commitments must be identical 
across different challenges. 

By adding the verification equations for c = 0 and c = 1 we get that 7r ,_1 (£1 ® 
£2) ® y € img A and thus that y = A .s' ® 7t ,_1 (£i ® <2), where s' = {r'\\m') 
is easy to compute. Now, using that ||£i ® <2 1| 1 = w, we have a valid witness of 
(A, y ) is thus given by (r' , m', 7r ,_1 (£i ® £2)). 

Honest- Verifier Zero-Knowledge. In the following we describe an efficient simu- 
lator S, which for each challenge cgZ 3 outputs an accepting protocol transcript 
the distribution of which is computationally indistinguishable from real protocol 
transactions with an honest prover for challenge c. 

0. The simulator S computes C'o and C\ like an honest prover, and computes 
Ci as a commitment to 0. Then, clearly, the distribution of C'o, C\, id, to, £1 
is identical to that in real protocol transcripts. Furthermore, by the compu- 
tational hiding property of the commitment scheme Com(.), the distribution 
of C-2 is computationally indistinguishable from that in real protocol runs. 

1. For c = 1, the simulator draws ir <— Sk, a <— I k and b <— I e+V . It sets 
C'o = Com(7r, A.b©y©a) and C2 = Com(7r(a)). The value of C\ is computed 
as commitments to 0. It easy to see that the openings of Co , C-2 pass the 
verification equations. To see the correctness of their distributions note that 
£2 in the real protocol run and 7 r(o) in the simulated run are perfectly uni- 
form in I k , and the permutations are also equally distributed both times. 
Concerning the opening of Co, note the following: in the real protocol run, 
we have to = A.v®f, where v is uniformly at random, and / = 7T _1 (£2®e); 
in the simulated transcript the content of Co is given by A. (6® s) ® (a® e). 
Now, v and b ® s are both uniformly random, and the terms f and a © e are 
uniquely determined by the contents of Co and C2. Thus, the distributions of 
C 0 ,C 2 and their openings are perfectly simulated. The distribution of Ci is 
computationally indistinguishable by the assumed hiding property of Com(.). 

2. Finally, for c = 2, the simulator draws a <— I k and b X k uniformly at 

random. It computed Co as a commitment to 0, C\ = Com(a) and C5 = 
Com(a © b) . As before, the distributions of Co is computationally indistin- 
guishable from real protocol runs by the binding property of Com(.), and Ci 
and C2 as well as their openings can easily be seen to perfectly simulate the 
behavior of an honest prover. □ 


4.2 Proving Linear Relations 

We next describe a 17-protocol which allows to prove that the messages hid- 
den within commitments yi,y2,y2 (where y, = A.(r.j||mi) © e^) satisfy arbi- 
trary linear relations. That is, Xi.mj © X 2 .m2 = m3 for arbitrary matrices 
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Xi,X 2 £ l vxv . The computational and communication complexity of the pro- 
tocol is roughly the same as for proving the knowledge of the three committed 
messages using the protocol from the previous section, proving that they also 
satisfy the linear relation comes almost for free. 

The high level idea of the protocol is as follows. P and V run the protocol from 
the previous section to prove knowledge of mi, m2, m3 for all the messages in 
parallel (but using the same challenge for all three). Recall that (oversimplifying 
a bit by ignoring the issue with the errors, i.e., the challenge c = 2) this protocol 
goes as follows: P commits to three random messages v\, u 2 , V3, and later opens 
the Vi’s (if c = 0) or Uj ® m, (if c = 1). We change this protocol now a bit, 
and instead choosing V3 at random we compute it as V3 = X| ,v-\ ® X 2 .u 2 . 
Moreover the verifier now additionally checks if V3 — X 1 .<>1. ©. X 2 .u 2 (if c = 0) 
and if (V3 ® m3) = Xi.(vi ® mi) ©X 2 .(u 2 ® m 2 ) (if c = 1). With these changes, 
we get a stronger soundness property: not only can we extract the committed 
messages trij from accepting answers to both challenges, but they will also satisfy 
m3 = Xi.mi © X 2 .m 2 . At the same time the zero-knowledge property is not 
weakened, except of course for leaking the fact that the m,’s satisfy this linear 
relation. 

The protocol flow is defined as follows: 

— P samples permutations 7Ti,7r 2 ,7T3 at random. 

It then draws v\, u 2 <— I v , u\, u 2 , U3<—X e , /1, / 2 , /3<— I k , sets V3 = Xi.iq© 
X 2 .u 2 and then sends the following commitments for * = 1, 2, 3 to the verifier 
V: 

Cm ■£- Com( 7 r( = 7 r* , t*o = A.(tXj||uj) © /j) 

Q 1 f- Com(tji = 7 Ti(fi)) 

Ci2 £- Com(tj 2 = 7 Ti(fi © e*)) 

— The verifier draws c<— Z3 and sends it to P. 

— Depending on the value of c, P opens the following commitments: 

0. P opens Cm, Cn by sending n', t,;o , t% 1 and the associated random coins. 

1. P opens Cio, Ca by sending 77', t,;o ■ U2 and the associated random coins. 

2. P opens Cn,Ci2 by sending tn,ta and the associated random coins. 

— The verifier verifies the correctness of the openings received from the prover, 
and additionally performs the following checks depending on the challenge 
e: 

0. V accepts, iff n' € Sk, there exist solutions (a,, b t ) £ I e x I v to the 
equations tio © 7rf _1 (tji) = A.(aj||6j) and they satisfy 63 = X|.fii © 
X 2 .6 2 . 

1. V accepts, iff there exist solutions (cj,dj) e X e x l v to the equations 
Uo © 7r' _1 (tj 2 ) © yt = A^CiUdj) and they satisfy 4/3 = Xi.di © X 2 .d 2 . 

2. V accepts, iff ||£ji © tj 2 ||i = w. 
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Theorem 4.2. The above protocol is a £ -protocol for the following relation: 



As we can turn any commitment y = A.(r||m)©e for an (unknown) message m 
into a commitment for the message m® x as y© A.(CK||a:) = A.(r||(mffix))©e. 
Our protocol directly implies a protocol for affine relations 


n ALPN = { ((A,X 1 ,X 2 ,{ a:i ,y i }f = 1 ),({r i ,m i ,e i }f =1 )) 



In particular, this allows to prove that m-i = 1" © m2, i.e., mi is the bitwise 
negation of m 2 . Furthermore, the protocol can be seen to directly generalize to 
relations among more than 3 secret messages as well. 

Pro of. W e do not give a full proof here, as it is very similar to that of Theo- 
rem |^j|. Besides technicalities, the only difference is to prove that the extracted 
witnesses indeed satisfy the required linear relation. 

This can be seen as follows. From the verification equations of c = 0 and c = 1 
we first get that y t = A.(aj©Cj||6j©di)©7^ _1 (£iiffit i2 ), where the second addend 
has low weight by the same arguments as earlier. Using the linear relations among 
the bi and the dj we further get (63 © d 3) = Xi.(&i © di) © X 2 .(b 2 © d 2 ). Thus, 
a valid witness is given by r' = a* © c*, m( = bi © di and e! i = 7r^ _1 (£ji © tj 2 ). 

To see that the protocol is still honest-verifier zero-knowledge it suffices to 
note that the only additional information the verifier learns is that the random 
coins used in the protocol and the secret witnesses satisfy the linear relation 
which is already part of the description of the relation IZllpn ■ The rest of the 
protocol is just a parallel execution of independent instances of the protocol for 
IZlpn- □ 

4.3 Proving Multiplicative Relations 

Finally, we present a protocol which can be used to prove a bitwise relation 
amongst commitments yi,y 2 ,y3 (where y t = A.(r,;||mj)® e, : ). That is, it allows 
one to prove that the messages satisfy m3 = mi o m 2 . The main idea of the 
protocol is to reduce the task of proving this multiplicative relation to a linear 
one, which we showed how to solve in the last section. 

In the protocol, which is given in detail below, the prover P first samples 
vectors mi, m 2 , m 3<—I 4:V such that (1) m3 = miom 2 and (2) for all (a, b) £ I 2 
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the number of indices j £ {1, . . . , It;} satisfying (mi [)], rh^lj]) = (a, b) is exactly 
v. Further, the prover draws a random matrix R «— X v x 4v with full rank such 
that each row has Hamming weight exactly 1 and such that R.m, = m, for 
i = 1,2,3 (so R is a t> x v permutation matrix with 3i> additional zero columns). 

Now P and V basically run the protocol from the previous section to prove 
the linear relation R.m,; = m^, with the crucial difference that the relation R is 
not know to V, instead the prover additionally sends a commitment to R with 
the first message. Moreover P sends commitments to the rhi s to V. 

The challenge space is extended from Z3 to Z4 (but will later merge c = 2 
and c = 3 and get back to 3). If c £ {0, 1,2}, the prover opens the commitment 
to R and sends the same answer as he would in the the protocol for proving 
the linear relation R .rhi = rn, for the given c. If c = 3, the prover opens the 
commitments to the ihj’s, and V checks if m3 = m 1 o rh 2 - 

The soundness of this protocol follows as R.rhj = m, and m 3 = rhi 0 m2 
together imply the claimed statement m3 = m\o m 2. 

The zero knowledge property holds as even though R together with the m»’s 
determines the m»’s, each by itself is completely independent of the m,'s, and 
we never open both. 

Finally, we observe that in our protocol for proving linear relations, the verifier 
does not need to know the linear relation R if c = 2. So we can collapse the 
challenges c = 2 and c = 3 as described above, but not open R in this case. 

Formally, the protocol flow is defined by the following algorithms: 

— P samples mi, m2, m3 X 4v such that m3 = mi o m2 and such that for all 
(a, 6) £ I 2 the number of indices j £ {1, . . . ,4u} satisfying (mi[}],m2[j]) = 
(a, b) is exactly v. Further, the prover draws a random matrix R <— X v x 4v 
with full rank such that each row has at most Hamming weight 1 and such 
that R.rrij = m; for i = 1,2, 3. 

In the following we denote the j th u-bit block of rhi by rh \ , i.e., rhj = 
i)„+r Similarly, R J denotes the matrix given by columns (j — 
l)u + 1 to jv of R. 

In the remainder of this protocol description all computations are done for 
* = 1, 2, 3 and j = 1,2,3, 4, respectively. 

P draws r\ ■£- X 1 and defines auxiliary images as y} = A.(r}||m}) ® e} for 
ej <—%£,■ ^ then samples permutations 7r,,7rj <— Sk at random. 

It then draws vj £-I v , Ui,uj £-X e , fi, fj sets Vi = Y^j=i R ^ - v i and 

then sends the following commitments to the verifier V: 


C £- Com (y'l = y {, . . . , y' 3 4 = y\) 

C R £ 

- Com(R / = R) 

C i0 £- Com(7r' = 7T* , <: i0 = A.^lluj) ® fi) 
Ca £- Com(tji = 7Tj(/i)) 

Ci2 <■ 

- Com(t i2 = © e,)) 

Clo ^ Com(7r' J = tt| , t} p = A.(u] ||u}) © f\) 
C 3 n <5- Com (t J a =.7T iifi)} 

cL* 

- Com(t} 2 = 7i i{f{ © e^)) 


— The verifier draws c<— Z3 and sends it to P. 
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— Depending on the value of c. P opens the following commitments: 

0. P opens C’io ,C'n, Cf Q , Cfy , Cr by sending 7r ■ . t l(i ,tn, pf . tj 0 . tj, , R' and 
the associated random coins. 

1 . P opens Cm , 2 , Cf 0 , Cf 2 , C , Cr by sending 7r • , tjo , t i2 , 7rf , t^ 0 , t^ 2 > 2/? ■ R' 

and the associated random^ coins. 

2 . P opens , C i2 , , C| 2 , C by sending t n , ij 2 , t 3 n , t\ 2 , vl and the asso- 

ciated random coins. 

— The verifier verifies the correctness of the openings received from the prover, 

and additionally performs the following checks depending on the challenge 
c: ? 

0. V accepts, iff 7T-, 7 rf GSk, there exist solutions (a*, b t ), (a- , b[ ) £l f x T° 

to the equations fyo © 7r^ — = A.(aj||fy) and t { 0 ® (irf) -1 ^) = 
A.(a^||6^), respectively, which satisfy bi = 1 R^^- 

1. V accepts, iff R' has full rank and each row has Hamming weight at most 
1, and iff there exist solutions (a,, bi), (aj . fy ) Gl e xl v to the equations 
tio®Tr'~ 1 (t i2 )®yi = A.(aj||6j) and t 3 i0 ® Wi ) _1 (*i 2 )©yf = A.(af||&|), 
respectively, which satisfy bi = JU =1 R /;/ ■ b] . 

2. V accepts, iff ||tji ® 1| 1 = ® t \ 2 ||i = w, y? = A.(fi|||jfi|) ® e|, 

|| ej ||i =w and raj o fh 2 = rfyj. 

Theorem 4.3. The above protocol is a E-protocol for the following relation: 
Kmlpn = { ((A, yi,y 2 , ys), (n, r 2 , r 3 , mi, m 2 , m 3 , ei, e 2 , e 3 )) : 

f\ (t/j = A.(rj||mj) ® e* A ||ej||i = w) A m 3 = mi o m 2 | . 

i«=l 

Proof. The 3-move form of the protocol is easy to see. Furthermore, completeness 
directly follows from the construction and can easily be verified. 

Special soundness. Concerning the special soundness of the proto col, note the 
following. Using the same arguments as in the proof of Theorem l4.ll we can 
extract openings m',r' and e' of the y,, and similarly, we get mf , r'P and ef 
which are valid openings for the yf. Now, by the same arguments as in Theo- 
rem o we can further infer that ra' = R J -Hi'i = R'ra'. Furthermore, 
we know that rh\ o ra 2 = m :i . Now, because of the special form of R', we can 
finally infer that the same relation must also be true for the m'. 

Honest-verifier zero-knowledge. We do not give a full simulator here, but only 
give the intuition why the protocol is zero-knowledge. Clearly, the raj are uni- 
formly random in their domain, and do not leak any information about the mi, 
as long as the matrix R is kept secret. Similarly, if the raj are kept secret, the 
matrix R itself is a uniformly random matrix of full rank with the specified 
restriction on the weight of its rows. Computationally this still hold s true even 
if the yl are revealed, as they are pseudorandom by Theorem Q The zero- 
knowledge property now follows from that of the protocol for IZllpn- □ 
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4.4 Proving Arbitrary Relations 

We finally briefly explain how one can use the protocols presented in this section 
to prove that committed values mo, mi satisfy mo = C(mi) for an arbitrary cir- 
cuit C. Let Ci, . . . , Cd denote the layers of C, i.e., C'(mi) = Cd{ ■ ■ ■ Ci(mi) . . .), 
where we assume that each Ci is either a linear function or a bitwise operation 
(e.g., bitwise NAND). For simplicity we assume the number of input and output 
wires to each Ci is £, where £ is the length of the underlying LPN problem. 

We use our string commitment scheme to commit to the values in the interme- 
diate layers, i.e., to strings x \, . . . , Xd where Xi = mi,xi = Ci(mi), . . . , Xd = 
C(mi) (note that we already have commitments to x\ = mi and x,i = mo). 
Now we use our U-protocols to prove that x i+ \ = Ci(xi) for i m I. . . d — 1. 

The total communication complexity of this protocol is Q(^2\Ci\£\og£) = 
@(|C'|£ log ^), the soundness error is 2/3, and thus for most applications must be 
lowered by (parallel) repetition. 

5 Conclusions and Open Problems 

We presented a very simple and efficient string commitment scheme, whose se- 
curity is based on the hardness of the LPN-problem, or, equivalently, on the 
hardness of decoding random linear codes. We further presented U-protocols 
which allow one to prove arbitrary relations among secret values itij, i.e., mo = 
C(mi, . . . , m u ) for any circuit C. The size of a proof is only quasi-linear in the 
length of the committed messages. 

We introduced an “exact” version of the LPN-problem which is polynomially 
equivalent to the standard LPN problem. This new assumption might be of inde- 
pendent interest as basing existing LPN based schemes on this new assumptions 
removes the completeness error (cf. ?J2I lor a discussion). 

It would be interesting to find protocols which already achieve a small knowl- 
edge error in only run, and do not rely on repetitions. Furthermore, a tighter 
reduction for the hardness of the decisional xLPN problem, in particular not 
relying on the Goldreich-Levin theorem, would be desirable. 

Acknowledgment. We are grateful to Petros Mol for helpful discussions on 
the reduction for the hardness of the xLPN problem. 
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Abstract. We introduce the notion of covert security with public ver- 
ifiability, building on the covert security model introduced by Aumann 
and Lindell (TCC 2007). Protocols that satisfy covert security guarantee 
that the honest parties involved in the protocol will notice any cheating 
attempt with some constant probability e. The idea behind the model 
is that the fear of being caught cheating will be enough of a deterrent 
to prevent any cheating attempt. However, in the basic covert security 
model, the honest parties are not able to persuade any third party (say, 
a judge) that a cheating occurred. 

Wepropose (and formally define) an extension ofthe model where, when 
an honest party detects cheating, it also receives a certificate that can be 
published and used to persuade other parties, without revealing any in- 
formation about the honest party’s input. In addition, malicious parties 
cannot create fake certificates in the attempt of framing innocents. 

Finally, we construct a secure two-party computation protocol for any 
functionality / that satisfies our definition, and our protocol is almost as 
efficient as the one of Aumann and Lindell. We believe that the fear of a 
public humiliation or even legal consequences vastly exceeds the deterrent 
given by standard covert security. Therefore, even a small value of the 
deterrent factor e will suffice in discouraging any cheating attempt. 


1 Introduction 

One of the main goals of the theory of cryptographic protocols is to find se- 
curity definitions that provide the participants with meaningful guarantees and 
that can, at the same time, be achieved by reasonably efficient protocols. Both 
standard security notions lack one of these two properties: the level of security 
offered by semi-honest secure protocols is unsatisfactory (as the only guaran- 
tee is that security is achieved if all parties follow the protocol specification) 
while malicious secure protocols (that offer security against arbitrarily behaving 
adversaries) are orders of magnitude slower than semi-honest ones. 
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In covert security, introduced by Aumann and Lindell in 2007 |AL07) . the 
honest parties have the guarantee that if the adversary tries to cheat in order to 
break some of the security properties of the protocol (correctness, confidentiality, 
input independence, etc.) then the honest parties will notice the cheating attempt 
with some constant probability e. Here, unlike the malicious model where the 
adversary cannot cheat at all, the adversary can effectively cheat while taking 
the risk of being caught. This relaxation of the security model allows protocol 
designers to construct highly efficient protocols, essentially only a small factor 
away from the efficiency of semi-honest protocols. 

The main justification for covert security is that, in many practical applica- 
tions, the relationship between the participants of the protocol is such that the 
fear of being caught cheating is enough of a deterrent to avoid any cheating 
attempt. For example, two companies that decide to engage in a secure compu- 
tation protocol might value their reputation and the possibility of future trading 
with the other company more than the possibility of learning a few bits of in- 
formation about the other company’s input, and therefore have no incentive in 
trying to cheat in the protocol at all. 

However, a closer look at the covert model reveals that the repercussion of a 
cheating attempt is somewhat limited: Indeed, if Alice tries to cheat, the protocol 
guarantees that she will be caught by Bob with some predetermined probability, 
and so Bob will know that Alice is dishonest. Nevertheless, Bob will not be able 
to bring Alice in front of a judge or to persuade a third party Charlie that Alice 
cheated, and therefore Alice’s reputation will only be hurt in Bob’s eyes and no 
one else. This is due to the fact that Charlie has no way of telling apart the 
situation where Alice cheated from the situation where Bob is trying to frame 
Alice to hurt her reputation: Bob can always generate fake transcripts that will 
be indistinguishable from a real interaction between a cheating Alice and Bob. 

This becomes a problem, as the fact that only Bob knows that Alice has tried 
to cheat may not be enough of a deterrent for Alice. In particular, consider the 
scenario where there is some social asymmetry between the parties, for instance 
if a very powerful company engages in a protocol with a smaller entity (i.e., a 
citizen). If the citizen does not have any clear evidence of the cheating she will 
not be able to get any compensation for the cheating attempt, as she will not 
be able to sue the company or persuade any other party of the misbehavior - 
who would believe her without any proof? This means that if we run a covert 
protocol between these parties, the fact that a party can detect the cheating 
may not be enough to prevent the more powerful one from attempting to cheat. 

The scenario described above can be dramatically changed if, once a party 
is caught cheating, the other party receives some undeniable evidence of this 
fact, and this evidence can be independently verified by any third party. We 
therefore introduce the notion of covert security with public verifiability where if 
a party is caught cheating, then the honest parties receive a certificate - a small 
piece of evidence - that can be published and used to prove to all those who 
are interested that indeed there was a dishonest behavior during the interaction. 
Clearly, this provides a stronger deterrent than the one given by covert security. 
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Intuitively, we want cheating parties to be accountable for their actions i.e., if 
a party cheats then everyone can be persuaded of this fact. At the same time, we 
need also the system to be defamation-free in the sense that no honest parties 
can be framed i.e., no party can produce a fake cheating certificate. 

Towards Better Efficiency: Choosing the Right e. In order to fully under- 
stand the benefit of covert- security with public verifiability, consider the utilities 
of a rational Alice, running a cryptographic protocol with Bob for some task. Let 
(Uh,U c ,Uf,Uf Ub ) be real numbers modeling Alice utilities: Alice’s utility is Uh 
when she runs the protocol honestly, and so both parties learn the output and 
nothing else. If Alice attempts to cheat, she will receive utility U c if the cheating 
attempt succeeds. If the cheating attempt fails (i.e., Alice gets caught), the util- 
ity received by Alice will be Uf in the standard covert security setting and U pub 
in the setting with public verifiability. We assume that U c > Uh > Uf > U pub , 
namely, Alice prefers to succeed cheating over the outcome of an honest ex- 
ecution, prefers the latter over being caught cheating, and prefers losing her 
reputation in the eye of one parties over losing it publicly. 

Remember that, since the protocol is e-deterrent, whenever Alice attempts to 
cheat she will be caught with probability e and succeed with probability 1 — e. 
Therefore, assuming that Bob is honest, Alice’s expected payoff is Uh when she 
plays the honest strategy and e • Uf + (1 — e) • U c when she plays cheating, with 
Uf G {Uf, U P f Ub } depending on whether the protocol satisfies public verifiability 
or not. Therefore if we set 

U c -U h 



then Alice will maximize her expected utility by playing honest. This implies 
that the value of e needed to discourage Alice from cheating is much higher in 
the standard covert security setting than in our framework. 

As the value of the deterrent factor e determines the replication factor and 
thus the efficiency of covert secure protocols, we believe that in practice using 
covert security with public verifiability will lead to an increase in efficiency, as 
the benefits obtained by the reduced replication factor will exceed the limited 
price to pay for achieving the public verifiability property on top of the covert 
secure protocol. 

Main Ideas. It is clear that no solution to our problem exists in the plain model 
and that we need to be able to publicly identify parties. We therefore set our 
study in the public-key infrastructure (PKI) model, where the keys of all parties 
are registered in some public database. Note that in practice this is not really 
an additional assumption, as most cryptographic protocols already assume the 
existence of authenticated point-to-point channels, that can be essentially only 
implemented by having some kind of PKI and letting the parties sign all the 
messages they exchange to each other. 

At this point it might seem that the problem we are trying to solve is trivial, 
and that the solution is simply to let all parties sign all the exchanged messages 
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in a covert secure protocol. Here is why this naive solution does not reach our 
goal: As a first problem, we need to make sure that the adversary cannot abort as 
a consequence of being caught cheating; think of a zero-knowledge (ZK) protocol 
with one bit challenge, where the prover only knows how to answer to a challenge 
c = 0. If the verifier asks for c = 1, the malicious prover has no reason to 
reply with an invalid proof and will abort instead. Surely, the honest party will 
suspect the prover of cheating but will have no certificate to show to a judge. 
The problem of an adversary aborting as an escape from being caught cheating 
was already raised in [AL071 Section 3.5], and the solution is to run all the cut- 
and-choose via an oblivious transfer (OT): here the prover (acting as a sender) 
inputs openings to all possible challenges and the verifier (acting as the receiver) 
inputs his random challenge. Due to the security of the OT, the prover now 
cannot choose whether to continue or abort the protocol as a function of the 
verifier’s challenge. The prover needs to decide in advance whether to take the 
risk of being caught, or abort before the execution of the OT protocol. 

Secondly, we need to ensure that the published certificate does not leak infor- 
mation about the honest party’s input: when the honest party detects cheating, 
it computes a certificate as a function of its view i.e., the (signed) transcript 
of the protocol, his input and his random tape. Therefore, this certificate may 
(even indirectly) leak information about the input of the honest party. This is 
clearly unsatisfactory and leads us to the following unfortunate situation: a party 
knows that the other party has cheated, however, in order to prove this fact to 
the public he is required to reveal to the adversary his private information. 

For the sake of concreteness, consider a protocol where Alice chooses a key pair 
( pk,sk ) for a homomorphic encryption scheme E, and sends Bob ( pk,E p k(x )) 
where x is Alice’s input. Later in the protocol, Alice and Bob use the homo- 
morphic properties of E for a cut-and-choose; i.e., Bob sends the first message 
of a ZK proof, Alice sends an encrypted challenge E p k(c) and Bob obliviously 
computes the last message of the ZK proof for the challenge c, and signs all the 
transcripts of the protocol. Alice finally decrypts and checks the validity of the 
proof. Note that Bob cannot abort as a function of c (due to the semantic secu- 
rity of the encryption scheme). If Bob cheats and Alice detects it, she receives 
a proof, a signature on the (encrypted) incriminating messages. Alice can now 
publish the transcript and her secret key sk in order to enable the judge to verify 
that Bob cheated. However, once the certificate is made public, Bob will learn 
the secret, decrypt the first ciphertext and learn x. 

Moreover, a malicious Alice might have a strategy to compute a different secret 
key sk' that makes the signed ciphertext decrypt to some “illegal” message that 
can be used to frame an innocent Bob. These examples show that things can 
easily go wrong, and motivates the need for a formal study of covert security 
with public verifiability. 

Signed Oblivious Transfer. As a building block for our construction we intro- 
duce a new cryptographic primitive, that we shall call signed oblivious-transfer. 
In this primitive, the sender inputs two message (mo, mi) and a signature key 
sk, and the receiver inputs a bit b. At the end of the protocol, the receiver will 
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learn the message mb together with a signature on it, while the sender learns 
nothing. That is, the receiver learns: (mb, Sig sfc (6, mb)). 

To see the importance of this tool in constructing protocols that satisfy covert 
security with public verifiability it is useful to see how it can be used to fix the 
problems with the zero-knowledge protocols described before. A very high level 
description of the signed-OT based zero-knowledge protocol is: (1) First the 
prover prepares the first message of the zero-knowledge protocol and sends it to 
the verifier together with a valid signature on it; (2) Now the prover prepares 
the answers to both challenges c = 0 and c = 1 and inputs them, together with 
his secret key, to the signed OT; (3) The verifier inputs a random choice bit c 
to the signed OT and receives the last message of the zero-knowledge protocol 
together with a valid signature on it. The verifier checks this message and, if the 
proof passes the verification, it outputs accept. On the other hand, if the proof 
is invalid, the verifier can take the transcript of the protocol and send them to 
any third party as an undeniable proof that the prover attempted to cheat. 

Note that this works only because b is included in the signature. Had b not 
be signed, the prover could input the simulated opening to both branches of 
the OT. This makes the (signed) transcript look always legit (in particular, 
it does not depend on the challenge bit b), and the verifier cannot persuade 
a third party that the prover did not properly answer to his challenge. Also, 
note that it is not enough to run a standard OT, where the prover inputs 
(mo, Sig(0, mo)), (mi, Sig(l, mi)), as in this case the prover could cheat by send- 
ing a valid signature on the valid opening, and no signature on the wrong opening 
- it is crucial for the security of the protocol that the verifier is persuaded that 
both signatures are valid, even if only one is received. 

Our Model. Our security definition guarantees that when an honest party pub- 
lishes the certificate, the adversary cannot gain any additional information from 
this certificate even when it is combined with the adversary’s view, in a strong 
simulation sense. This, together with the fact that in the strong explicit cheat 
formulation of covert security a cheating party does not learn any information 
about the honest party’s input and output, guarantees that the certificate does 
not leak any unintentional information to anyone seeing the certificate (i.e. , the 
certificate can be simulated without the input /output of the honest party). 

A covert secure protocol with public verifiability is composed of an “honest” 
protocol and two extra algorithms to deal with cheating situations: the first 
is used to produce a certificate when a cheating is detected, and the other to 
decide whether a certificate is authentic or not. The requirements for the two 
latter algorithms are the following: any time that an honest party outputs that 
the other party is corrupted, the evaluation of the verification algorithm on 
the produced certificate should output the identity of the corrupted party. In 
addition, no one should be able to produce incriminating certificates against 
honest parties. 

Organization and Results. In Section |21 we define and justify the model of 
covert security with public verifiability. In Section 0 we show how to construct 
a signed-OT protocol: our starting point is the very efficient OT protocol due 
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to Peikert, Vaikuntanathan and Waters |PVW08j . The resulting protocol is only 
slightly less efficient than the protocol of PVW. 

Signed-OT will also be the main ingredient in our protocol for two-party 
secure computation using Yao’s garbled circuit, described in Section 01 Here 
we show that for any two party functionality /, there exists an efficient covert 
secure protocol with e-deterrent and public verifiability. Our protocol is roughly 
1/e slower than a semi-honest secure protocol, and has essentially the same 
complexity as an e-deterrent secure protocol without public verifiability. 

Technically, our starting point is the protocol presented in jAL071 Section 6.3] 
(the variant where aborting is not considered cheating) the only differences with 
the original protocol are that every call to an OT is replaced by a call to a signed- 
OT, and that the circuit constructor will also send a few signatures in the right 
places. We believe that this is a very positive fact as the resulting protocol is only 
slightly less efficient than the original covert secure protocol, showing how covert 
security with public verifiability offers a much greater deterrent to cheating than 
standard covert security (as a cheater can face huge loss in reputation or even 
legal consequences), while only slightly decreasing the efficiency of the protocol. 

Related Work. The idea of allowing malicious parties to cheat as long as this 
is detected with significant probability can be found in several works, e.g. jKY^l 
ll K N P03IIM N PS04| , and it was first formally introduced under the name of covert 
security by Aumann and Lindell |A 1 ,07] . Since then, several protocols satisfying 
this definition have been constructed, for instance |HL(18LlTTMS()8linTlN 1()| . It is 
possible to add the public verifiability property to any of these protocols. Doing 
so in the most efficient way is left as a future work. 

2 Definitions 

Preliminaries. A function //(•) is negligible, if for every positive polynomial p(-) 
and all sufficiently large n’s it holds that /r(n) < l/p(n). A probability ensemble 
X = {X(a, u)} ae {o,i}»;neN is an infinite sequence of random variables indexed by 
a and n G N. Usually, the value a represents the parties’ inputs and n the security 
parameter. Two distributions ensembles X = {A(a, «)} a e{o,i}*;neN and Y = 
{Y{a, n)}oe{o,i}*,neN are said to be computationally indistinguishable, denoted 
X = Y, if for every non-uniform polynomial-time algorithm D there exists a 
negligible function p(-) such that for every a G {0, 1}* and every n G N, 

|Pr [D (A (a, n)) = 1] - Pr [D ( Y{a , n)) = 1]| < A*(n) 

We assume the reader to be familiar with the standard definition for secure 
multiparty computation |Gan()0HGol()4j . 

Covert Security: Aumann and Lindell |Ab()7| present three possible defini- 
tions for this notion of security, where the three definitions constitute a strict 
hierarchy. We adopt the strongest definition that is presented, which is called 
“strong explicit cheat formulation” (Section 3.4 in (AL07j ). 
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A protocol that is secure with respect to this definition is also secure with 
respect to the two other suggested definitions. Informally, in this stronger for- 
mulation, the adversary may choose to input a special input cheat to the ideal 
functionality. The ideal functionality will then flip a coin and with probability 
(1 — e) will give to the adversary full control: the adversary will learn the honest 
party’s input and instruct the functionality to deliver any output of its choice. 
However, with probability e, the ideal functionality will inform the honest party 
of the cheating attempt by sending him a special symbol corrupted, and crucially, 
the adversary will not learn any information about the honest party’s input. 


2.1 Covert Security with Public Verifiability 

For the sake of simplicity, we will present the definition and the motivation for 
the two-party case. The definition can be easily extended to the multi-party case. 

Motivation: As discussed in the introduction, we work in the J ?PKI -hybrid 
model where each party Pj registers a verification key vk% for a signature scheme. 
This key will be used to uniquely identify a party. Note that we do not require 
parties to prove knowledge of their secret keys (i.e. , the simulator will not know 
these secret keys), so this is the weakest P PKI formulation possible IBCXl’O 1 • 

We extend the covert security model of Aumann and Lindell |A 1 J)7j and 
enhance it with the public verifiability property: As in covert security, if the 
adversary chooses to cheat it will be caught with probability e, and the honest 
party outputs corrupted. However, in this latter case, the protocol in addition 
provides this party an algorithm Blame to distil a certificate from its view in the 
protocol. A third party who wants to verify the cheating ( “the judge” ) should 
take the certificate and decide whether the certificate is authentic (i.e., some 
cheater has been caught) or it is a fake (i.e., someone is trying to frame an 
innocent). The verification is performed using an additional algorithm, which is 
called Judgement. We require the verification procedure to be non-interactive , 
which will enable the honest party to send the certificate to a judge or to publish 
it on a public “wall of shame” . 

In addition, as our interest is mainly to protect the interest of the honest 
party, we want to make sure that the certificate of cheating does not reveal any 
unnecessary information to the verifier. Therefore, we cannot simply publish the 
view (transcript and random tape) of the honest party, as those might reveal 
some information about the input or output of the honest party. In addition, we 
need to remember that the adversary sees the certificate once it is published and 
therefore we should take care that no one will be able to learn any meaningful 
information from this certificate, even when combining it with the adversary’s 
view. To capture this fact, we use the convention that when a party detects a 
cheating, it creates the certificate and sends it to the adversary. 

The fact that the certificate is part of the view of the adversary means that the 
simulator needs to include this certificate as a part of the view when it receives 
corrupted from the ideal functionality. Remember that in this case the simulator 
does not learn anything from the trusted party rather than the adversary got 
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caught, and therefore this implies that our definition ensures that the certificate 
cannot reveal the private information of the honest party. 

Regarding the Judgement algorithm, we require two security properties: when- 
ever an honest party outputs corrupted, running the algorithm on the certificate 
will output the identity of the corrupted party. Moreover, no adversary (even 
interacting with polynomially many honest parties) can produce a certificate for 
which the verification algorithm outputs the identity of an honest party. 


2.2 The Formal Definition 

Let / be a two party functionality. We consider the triple (n, Blame, Judgement). 
The algorithm Blame gets as input the view of the honest party (in case of cheat 
detection) and outputs a certificate Cert. The verification algorithm, Judgement, 
takes as input a certificate Cert and outputs the identity id (for instance, the 
verification key) of the party to blame or none in the case of an invalid certificate. 

The Protocol: Let n be a two party protocol. If an honest party detects a 
cheating in 7 r then the honest party is instructed to compute Cert = Blame(view) 
and send it to the adversary. 

Let REAL 7 rj _4( z ) |i »(a:i, x 2 \ 1") denote the output of the honest party and the 
adversary on a real execution of the protocol n where P± , P2 are invoked with 
inputs Xi,X2 , the adversary is invoked with an auxiliary input z and corrupts 
party Pj» for some i* £ {1,2}. 

The Ideal World. The ideal world is exactly as jA 1,071 Definition 3.4]. Let 
ideal,,. ( aq, x-z) denote the output of the honest party, together with the 
output of the simulator, on an ideal execution with the functionality /, where 
P\ , P-2 are invoked with inputs x\ , re 2 , respectively, the simulator S is invoked 
with an auxiliary input z and the corrupted party is P,. , for some i* £ {1,2}. 

Notations. Let exec,^^) ( aq , x 2 \ n, r 2; 1") denote the messages and the out- 
puts of the parties in an execution of the protocol 7T with adversary A on auxil- 
iary input z, where the inputs of Pi,P2 are X\ . X2, respectively, and the random 
tapes are (ri,r2). Let EXEC 7r ^(j ; ) (x-i . £2; 1") denote the probability distribution 
of EXEC 7ri ^( 2 )(a;i,a;2;ri,r2) where (ri,r2) are chosen uniformly at random. Let 
OUTPUT(EXEC 7ri ^( 2: ) (xi , x 2 )) denote the output of the honest party in the exe- 
cution described above. We are now ready to define the security properties. 

Definition 1 (covert security with e-deterrent and public verifiability) 

Let f, 7 r, Blame and Judgement be as above. We say that (77, Blame, Judgement) 
securely computes / in the presence of a covert adversary with e-deterrent and public 
verifiability if the following conditions hold: 

1. (Simulatability with e-deterrent:) The protocol n (where the honest 
party broadcasts Cert = Blame(view) if it detects cheating) is secure against 
a covert adversary according to the strong explicit cheat formulation with 
e-deterrent (see jALOH Definition 3-4]). 
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2. (Accountability:) For every ppt adversary A corrupting party I\* for i* £ 
{1, 2}, there exists a negligible function //(•) such that for all sufficiently large 
x\, X 2 , z G ({0, l}*) 3 the following holds: 

// output (exe^ ( a; 1 , 0 : 2 ; l n )) = corrupted^ then: 

Pr [Judgement (Cert) = idi *] > 1 — p(n) 

where Cert is the output certificate of the honest party in the execution. 

3. (Defamation- Free:) For every ppt adversary A controlling i* £ {1, 2} and 
interacting with the honest party, there exists a negligible function /x(-) such 
that for all sufficiently large xi,X 2 ,z £ ({0, l}*) 3 : 

Pr [Cert* <— A; Judgement( Cert*) = id$-i*] < p(ri) 

Every Malicious Secure Protocol Is Also Covert Secure with Public 
Verifiability. As a sanity check, we note that any protocol that is secure against 
malicious adversaries satisfies all of the above requirements, with deterrence 
factor e = 1 — negl(n): aborting is the only possible malicious behavior. Therefore 
the function Blame will never be invoked and the function Judgement outputs 
none on every input. In other words, given that no cheating strategy can succeed 
except with negligible probability, we have that by definition no one ever “cheats” 
and no one can be “framed” . 

3 Signed Oblivious Transfer 

As discussed in the introduction, signed oblivious transfer (signed OT) is one of 
the main ingredient in our construction. For the sake of presentation, one can 
think of signed OT as a protocol implementing the following functionality: 

(J_; (mb, Sig sk (b, m b ))) £- T ((m 0 , mi, sk), ( b , vk)) 

However it turns out that while this formulation certainly suffices for our goal, it 
is not necessary for our secure two-party computation protocol in Section 01 In 
particular, we don’t need the signature to be computed by the ideal functionality. 
We therefore use a relaxed version of the signed OT functionality, that allows 
a malicious sender to choose any two strings (c>o,cr*) and input them to the 
functionality. If (oq, <t*) are valid signatures on the messages (0, mo) and (1, mi) 
respectively, the functionality delivers (m b , <r£) to the receiver or abort otherwise. 
In other words, we allow a corrupted sender to influence the randomness involved 
in the generation of the signature, as long as it provides correct signatures for 
both messages. See Functionality Q] for the formal description. 

3.1 A PVW Compatible Signature Scheme 

As a first step, we will construct a (somewhat contrived) signature scheme, de- 
signed to combine efficiently with the OT protocol. Essentially, we are combining 


690 G. Asharov and C. Orlandi 


FUNCTIONALITY 1 (The Signed OT Functionality - ^ ignedOT ) 

The functionality is parameterized by a signature Scheme II = (Gen, Sig, Ver). 

Inputs: The receiver inputs (life, 6) - a verification key together with a bit 
b G {0,1}. The input of the sender is (mo, mi, sk, erg, erf). An honest 
sender is restricted to input (o$, &i) s fAJ*)- 

Output: If (oo,<x{) = (_L,_L) the functionality computes u = Sig sk (b,mb) 
and verifies that Ver„fc((6, mi), a) = 1. It then outputs (mb,cr) to the 
receiver or abort in case where the verification fails. 

If (oo,<r{) 7^ (X, X) the functionality outputs ( mb,crl ) to the receiver if 
Ver„fc((0, mo),oo) = 1 and Ver„fc((0, mo), (To) = 1 or abort otherwise. 


a signature scheme II' = (Gen', Sig', Ver') with a computationally binding com- 
mitment Com = (Setup, Com, Open) (we do not need the commitment to be 
hiding). The verification key vk of the combined scheme is the same as the ver- 
ification key of the original scheme vk' . On input a message m, the combined 
signature algorithm Sig chooses a random commitment key ck = Setup(l n ) and 
a string r, compute the commitment (c, d) = Com c k(m;r) and outputs: 

(ck, d, c, Sig ' sk (ck, cj) <— S\g sk (m) . (1) 

On input (m,(ck,d,c,a)), the verification algorithm Ver outputs 1 if and only 
if Open ck (c,d) = rn and Ver vk ((ck,c),a) = 1. Unforgeability of the combined 
scheme follows from the unforgeability of the original scheme together with the 
binding property of the commitment scheme. (Note that here is the signer creates 
both the commitment key and the commitment itself - differently from the 
standard game for computationally binding commitments, where the receiver 
needs to generate the key.) See the full version for details. 

We present the commitment scheme that we use in the above template. Let 
(G, q) be a prime order group where the DDH assumption is believed to hold. 
Define the randomized function RAN D (go, ho, gi, hi) = (u,v), where u = 
( go) s ■ (hoY and v = (gi) s ■ (hi)* and s,t Gr %q- Observe that if (go, ho,g\, hi) 
is a DDH tuple for some x (i.e, there exists an x such that gi = g^ and hi = /i§) 
then u is distributed at random in G and v = u x . In case where (go, ho,gi, hi) 
is not a DDH tuple (i.e, log, ;o gi Y log^ hi) then the pair (it, v) is distributed 
uniformly at random in G 2 . See |PV W08j for more details. The commitment 
scheme is as follows: 

— The Setup Algorithm Setup: On input security parameter 1", the setup 
chooses a DDH tuple (go, ho,gi,hi) in G and defines ck = (go, ho, gi,hi). 

— The Commitment Algorithm Com c fc: On input message (b, m) G {0, 1} x 
G, the Com algorithm chooses a random r Gr 1 q and computes (g, h) = 
(. gb,h b ) r and (u b ,v b ) = RAND(g b ,g,h b ,h), w b = m ■ v b , (u X - b ,w i- b ) Gr 
G 2 . Then, it defines c = (g,h,uo,wo,ui,wi) and the decommitment value 
d = (r; (b,m)). 
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— The Opening Algorithm Open cfc (c, d): On input key ck = (go, ho, gi, hi), 
commitment c = (g, h, uo, wo, Ui,wi) and decommitment d = (r; ( b , m)), the 
opening algorithm checks that (g, h) = ( gb,hb) r and Wf, = rn ■ u r h . If so it 
outputs (6, m), otherwise 1,. 

Claim 1. Assuming computing discrete logarithms is hard in G, the scheme 
(Setup, Com, Open) is computationally binding. 

Proof Sketch: To see that the scheme is binding, observe that there is a unique 
mapping between r and ( b , m) in the following way: given a commitment c = 
(g,h,uo,wo,ui,wi) and the decommitment r, we search for b for which (g, h) = 
(gs, hb) r - Given (r, b), the message m is defined as: Wb-(ub)~ r . Therefore, the only 
way that an adversary can break the binding property of a given commitment 
c is by finding r' for which (g,h) = (g\_ b , h\_ b ). But, to find such an r' the 
adversary needs to break the discrete logarithm assumption. iff 

Our PVW compatible signature scheme II = (Gen, Sig, Ver) is the a combination 
of the signature scheme II' and the commitment scheme Com as defined in 
Eq. (GJ. We conclude: 

Corollary 1. If 1 V = (Gen 7 , Sig 7 , Ver 7 ) is an existentially unforgeable under 
an adaptive chosen-message attack signature scheme and the discrete logarithm 
problem is hard in (Q,go,q), then II = (Gen, Sig, Ver) is also existentially un- 
forgeable under an adaptive chosen-message attack. 


3.2 PVW-Based Signed OT 

We present the protocol for signed OT in Protocol GJ combining the PVW OT 
protocol with the signature scheme described above. Like the original OT pro- 
tocol |PV W08j , our signed OT protocol can be extended in the straightforward 
way to an l-out-of-7 signed OT (see the full version). Note that the overall pro- 
tocol is just the DDH-based instantiation of the PVW OT framework with the 
following differences (clearly marked in the protocol description): (1) The sender 
chooses the “CRS” (go, ho, gi, hi) and proves that it is a DDH tuple. (Remember 
that in this case the receiver’s message hides his choice bit statistically). (2) The 
sender signs all the messages it sends to the receiver. 

Note that the Com algorithm is distributed, in the sense that both parties 
contribute to the input and randomness: in particular the receiver chooses b 
while the sender specifies (mo, mi) without knowing which message is going to 
be chosen. 

Lemma 1 . Let 17 = (Gen, Sig, Ver) be the PVW- compatible signature scheme 
defined above. Then, Protocol^ securely implements the jf^g aedOT -functionality 
in the presence of a malicious adversary. 
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PROTOCOL 1 (Signed One-out-of-Two OT Protocol) 

Setup: This step can be done once and reused for multiple runs of the OT: 
The sender S chooses (go, ho) <£m G 2 and a random a €r Z q and compute 
ffi — 9o and hi = ho- The sender sends (go, ho, gi, hi) to the receiver 
R and gives a zero- knowledge proof-of-knowledge that (go, ho, gi, hi) is a 
DDH tuple. 

Choose: R chooses random r £/{ Z q , computes g = (gb) T , h = (hi,)' and 
sends (g, h) to S', 

Transfer: The sender operates in the following way: 

1. S computes (uo,vo) = RAND(go,g,ho,h) and (ui,ni) = 
RAND(gi,g,hi,h)-, 

2. S sends R the values (uo, wo) where wo = vo ■ mo, and (ui,wi) where 

3. (diff) S sends to the receiver 

a' = Sig' sk ,((go,ho,gi,hi),(g,h,uo,w 0 ,ui,wi)); 

Retrieve: (diff) Let vk = vk' . R checks that u' is a valid signature on the 
transcript of the protocol. If so, R outputs: mb = Wb ■ (ub)~ r and (diff) 

a = ((go, h 0 ,gi,hi), (r; (b, mb)), (g, h, u 0 ,w 0 , ui,wi), </) . 
Otherwise, it outputs abort. 


Proof Sketch: As discussed in Corollary [IJ a is a proper signature on the 
message (6, rut), and therefore the correct functionality is implemented when 
both parties are honest. 

The proof of security of the underlying OT protocol is by now standard and 
can be found in jPVW08limt)j . When the receiver is corrupted, the simulator 
plays as an honest sender except that it chooses instead a non-DDH tuple in 
step “Setup” (i.e., some (go,go,9i,9i)) and then, given the pair (g,h) and using 
the trapdoor (x, y), it can extract the receiver input’s bit b by finding whether h 
equals g x or g y . It then sends b to the functionality gnedOT . Clearly, adding the 
signature a' does not break any security property of the original OT protocol (it 
is easy to see that any attack to this protocol can be reduced to an attack to the 
original protocol, where the reduction will simply produce this extra signature). 

For the case of a corrupted sender, the simulator plays as an honest receiver 
(with 6 = 1) except that it extracts a from the zero-knowledge proof in step 
“Setup”. Using this trapdoor, it can compute both messages mo, mi (as in the 
proof of the original protocol). Then, it computes the two signatures cr(, a\ as 
follows: 


°o = (( 9o , ho, gi,hi),(a-r, (0, mo)), (g, h, u 0 , w 0 ,U!, uq), a') 

°i = ((ffo, ho, gi, hi), ( r , (1, mi)), (g, h, u 0 , w 0 , ui,wi), a') 

In order to see that these are valid signatures on (0,mo), (1 ,toi) respectively, 
recall that (g,h) = (gi,h\) r = (go,ho) a r . This implies that a ■ r is a valid 
opening of c for (0 ,toq) whereas r is the opening of c for (1 ,toi). Finally, it is 
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easy to see that the distribution of the constructed signatures are the same as 
in the real execution. | 

4 Two-Party Computation with Publicly Verifiable 
Covert Security 

The protocol is an extension of the two party protocol of |AL 07 | , which is based 
on Yao’s garbled circuit protocol for secure two-party computation. We will start 
with an informal discussion of the ways that a malicious adversary can cheat 
in Yao’s protocol and we will present the (existing) countermeasures to make 
sure that such attacks will be detected with significant probability, thus leading 
to covert security. Finally we will describe how to add the public verifiability 
property on top of this. The ways that a malicious adversary can cheat in Yao’s 
protocol are as follows: 

1. Constructing bad circuits: To prevent Pi from constructing a circuit that 
computes a function different than /, Pi constructs l independent garbled 
circuits and P2 checks t— 1 of them. Therefore if Pi cheats in the construction 
of the circuits, P2 will notice this with probability > 1 — 1/L To make sure 
Pi cannot abort if it is challenged on an incorrect circuit, we run the cut- 
and-choose through a 1 -out-of-f? signed OT, so that P2 will always receive 
some (signed) opening of the circuits that can be used to prove a cheating 
attempt to a third party. 

2 . Selective failure attack on P^’s input values: When P2 retrieves its 
keys (using the OT protocol), Pi may take a guess g at one of the inputs 
bits of P2. Then, it may use some string r instead of the valid key ki- g , as 
input to the OT protocol. Now, in case where that Pi guesses correctly and 
indeed the input bit equals g, P2 receives k g and does not notice that there 
was anything wrong. However, in case the guess is incorrect, Pi receives r 
instead of k\- g which is an invalid key and thus it aborts. In both cases, the 
way P2 reacts completely reveals this input bit. This problem can be fixed 
by computing a different circuit, where P2’s input is an m-out-of-m linear 
secret sharing of each one of the input bits of P2. Now every m — 1 input bits 
of P2 to the protocol are uniformly random and therefore P2 will get caught 
with probability 1 — 2 -m+1 if it attempts to guess (the encoding of) an input 
bit. By using a signed OT we will ensure that P2 receives a certificate on the 
wrong keys if Pi cheats. 

Let Com denote a perfectly-binding commitment scheme, where Com (a;; r) denotes 
a commitment to x using randomness r. (GeriENC , Enc, Dec) is a semantically secure 
symmetric encryption scheme. (Gen, Sig, Ver) is an existentially unforgeable signa- 
ture scheme under an adaptive chosen-message attack. Note that it is crucial that 
every message is signed together with some extra- information about the role of this 

1 We assume the reader to be familiar with Yao’s garbled circuit protocol. See ILP09I 
for more details and full proof of security. 
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message (i.e., with unique identifiers for the parties executing the protocols, the in- 
stance of the protocol, which type of message in the protocol, which gate / wire label 
is the message associated too etc.) but we will neglect these extra information in 
the description of our protocol for the sake of simplicity. 

PROTOCOL 2 [Two-Party Secure Computation] 

Inputs: Party Pi has input xi and Party P2 has input X2, where |aq| = l^l- In 
addition, both parties have parameters l and m, and a security parameter n. For 
simplicity, we will assume that the length of the inputs are n. ( diff) Party Pi 
knows a secret key sk for a signature scheme and P2 received the corresponding 
verification key vk from the F fki . 

Auxiliary Input: Both parties have the description of a circuit C for inputs 
of length n that computes the function f. The input wires associated with x\ are 
wi,...,w n and the input wires associated with X2 are w n+ 1, . . . , W2 n - 

The Protocol 

1 . Parties Pi and P2 define a new circuit C' that receives m+ 1 inputs x\,x\, 
■ ■■jX™ each of length n, and computes the function f{x i,®^®!). Note 
that C' has n + mn input wires. Denote the input wires associated with xi by 
wi , . . . , w n , and the input wires associated with x\ by w n +(i-i) m , ■ ■ ■ , w n+ i m 
for i — 

2 . P2 chooses m — 1 strings x\,...,x™ 1 uniformly and independently at ran- 
dom form {0, 1}”, and defines x™ = where X2 is P2 ’s original 

input. Observe that ® r fLix\ = X2 ■ 

3 . For each i = 1, . . . , mn and [3 = 0, 1, party Pi chooses I encryption keys by 
running GeriENc(l") for I times. Denote the jth key associated with a given 
i and j 3 by k° Wn . 

f. Pi and P2 invoke the mn times the ( diff) _7r^ 1 s nedOT functionality with the 
following inputs: In the ith execution, party Pi inputs the pair: 

and party P2 inputs the bit x\ (P2 receives the keys + ^ , . . . , + , X «J 

and a signature on this as output). If P2 output in the OT is abort*, then it 
outputs abort* and halts. 

5 . Party Pi constructs t garbled circuits GCi , . . . , GCg using independent ran- 
domness for the circuit C' described above. The keys for the input wires 
w n+ i, . . . ,w n+mn in the garbled circuits are taken from above (i.e., the keys 
associated with w n+ i are k 2 3 w o and kl, 1 ). The keys for the inputs wires 
wi , . . . , w n are chosen randomly, and are denoted in the same way. Pi sends 
the £ garbled circuits to P2 (ddS) together with a signature on those. 

2 The description of the protocol is almost verbatim from |AL07| to help the reader 

identify the few (clearly marked) differences between our protocol and the original 
protocol. 
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6 . Pi commits to the keys associated with its inputs. That is, for every i = 

1 n, p = 0, 1 and j = 1 , . . . , £, party Pi computes (cliffy.- 

<4,* = c ° m = si § sk{<j Wi ,p) 

The commitments and the signatures are sent ast vectors of pairs (one vector 
for each circuit); in the jth vector the ith pair is {{cf w . 0 , <r > Wi 0 ), (c 7 W . i,<r^. 0 )} 
in a random order (the order is randomly chosen independently for each 
pair), (cliff,) Party P2 verifies that all the signatures are correct. If not, it 
halts and outputs aborti. 

7 . P2 chooses a random index 7 G r { 1 . . . . , 

8 . (cliffy Pi and P2 engage in a 0 -signed OT, where P2 inputs 7 and, for i = 
1 , ... , 1 , Pi inputs as the ith message of the signed OT all of the keys for the 
inputs wires in all garbled circuits except for GCi, together with the associated 
mappings and the decommitment values. Pi sends also decommitments to the 
input keys associated with its input for the circuit GCi ■ 

P2 receives the openings for £—1 circuits ( all but GC 1 ) together with a sig- 
nature on them. P2 receives also the decommitments and the keys associated 
with Pi ’s input for circuit GC 7 together with signatures on them. If any of 
the signatures are incorrect, it halts and outputs aborti. 

9 . P2 checks that: 

— That the keys it received for all GCj, j ^ 7, indeed decrypt the circuits 
and the decrypted circuits are all C' . (cliff ) If not, add key = wrongCircuit 
to its view. 

— That the decommitment values correctly open all the commitments c? w . p 
that were received, and these decommitments reveal the keys kf w . p that 
were sent for Pi ’s wires, (diffy If not, add key = wrongDecommitment 
to its view. 

— That the keys received in the signed OT in Step 4 match the appro- 
priate keys that it received in the opening. (diff) If not, add key = 
selectiveOTattack to its view. 

If all check pass, proceed to the next step, else (cliff ), P 2 computes Cert = 
Blame(view2) (see the description of Blame for its output on different key 
values), it publishes Cert and output corrupted 

10 . P2 checks that the values received are valid decommitments to the commit- 
ments received above. If not, it outputs aborti. If yes, it uses the keys to 
compute C'(xi,Z2) = C'(x i,x \, . . . , x ™) = C(xi, X2), and outputs the result. 

Theorem 1 . Let £ and m be parameters in the protocol that are both upper- 
bound by poly(n), and set e = (1 — l/f!)(l — 2 _m+1 ), and let f be a probabilis- 
tic polynomial-time function and let n denote Protocol OJ Then, assuming the 
DDH assumption, security of the commitment scheme, signature scheme and 
symmetric encryption scheme as described above, (n, Blame, Judgement) securely 
computes f in the presence of covert adversaries with e-deterrent and public 
verifiability (i.e, satisfies Definition^. 
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ALGORITHM 1 (The Blame Algorithm - Blame) 

Input: The view of a honest party view, containing an error tag key. 

Output: A certificate Cert = {id, key, message, a). 

The Algorithm: 

— Case 1: key = wrongCircuit: Let j be the smallest index s.t. the garbled 
circuit GCj is not a garbling of C' . Let message be the commitment to 
GCj concatenated with the opening obtained via the -signed OT in 
Step 8, and a the signature on these messages. 

— Case 2: key = wrongDecommitment: Let message be ( c,x,r ) be a com- 
mitment where c yt Com(a;;r) and a the signatures on c and (x, r). 

— Case 3: key = selectiveOTattack: let message be a garbled circuit GC\ and 
two keys to one of its input gates. Let cr be the signature on the circuit 
and the signatures on the keys obtained in Step 8. 

On any other case, output _L. 


ALGORITHM 2 (The Public Verification Algorithm - Judgement) 
Input: A certificate Cert = {id, key, message, a). 

Output: The identity id or none. 

The Algorithm: If cr is not a valid signature on the message message ac- 
cording to verification key vku halt and output none. Else: 

— Case 1: key = wrongCircuit: Parse message as a garbled circuit GC and 
the randomness r used to generate it. If GC is not an encryption of the 
circuit computing C' using randomness r output id or none otherwise. 

— Case 2: key = wrongDecommitment: Parse message as (c, a :,r). If c 
Com(a;;r) output id or none otherwise. 

— Case 3: key = selectiveOTattack: Parse message as a circuit GC and two 
keys k l , k J for an input gate g of the circuit GC. If fc‘, k' do not decrypt 
the gate g output id or none otherwise. 


Note that even for very small replication factors this construction gives reason- 
able level of deterrence factor e.g., £ = 3 and m = 3 lead to e = 50%. We can 
now proceed to the proof. 

Proof Sketch: We show that our protocol satisfies each one of the properties 
as in Definition nj We will use the similarity between our protocol and the one 
of jAL07l to argue for covert security with e-deterrent. 

Corrupted P 2 . Our protocol achieves security in the presence of a malicious 
P 2 . The security follows from the J r )) lsIledOT -f U nctionality (that as we have seen, 
can be implemented efficiently with malicious security) and the same reasoning 
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as in jAL07j . with the exception that here we use a fully secure malicious OT 
instead of a a covert. We are therefore left with the case where Pi is corrupted. 
Simulatability with e-deterrent. Our protocol is in fact the same protocol 
as in | A I i)7lj . with the following differences: (1) In Steps 5 and 6, Pi sends 
its messages together with a signature on those. (2) In Steps 4 and 8, signed 
OT is used instead of standard OT. (3) In Step 9, if P2 outputs corrupted^ 
then it sends Cert = Blame(view2) to the adversary. Let 7To be the protocol 
of |AL07l Section 6.3] and 7Ti, 7r2, 773 the protocols after the changes explained in 
bullets 1, 2, 3 respectively. 

Protocols 7Ti and 7T2 differ from no only because Pi signs the messages it sends 
to P2. In the full version, we show that if 7r is a covert secure protocol with 
e-deterrent and id is the same protocol as n with the only change that parties 
sign on all the message they send, then n' is also a covert secure protocol with 
e-deterrent. We therefore conclude that 7T2 is also a covert secure protocol with 
e-deterrent. 

The only difference between 773 and 772 is that if P2 outputs corrupted!, then 
the adversary learns the certificate Cert. In the full version, we show that this 
extra information can be simulated as well and so the overall protocol is covert 
protocol with e-deterrent. 

Accountability. Accountability follows from the description of the protocol 
7r and the Blame, Judgement algorithms: an adversarial Pi who constructs one 
faulty circuit must decide before the oblivious transfer in Step 9 if it wishes to 
abort (in which case there is no successful cheating) or if it wishes to proceed 
(in which case P2 will receive an explicitly invalid opening and a signature on 
it). Note that due to the security of the oblivious transfer, Pi cannot know what 
value 7 party P2 inputs, and so cannot avoid being detected. 

Once the honest party outputs the certificate, it contains all the necessary 
information that caused the party to decide on the corruption. The verification 
algorithm Judgement performs exactly the same check as the honest party, and 
so accountability holds. 

Defamation-Free. We need to show that for every PPT adversary A control- 
ling i* £ {1,2} and interacting with the honest party, there exists a negligible 
function ^i(-) such that for all sufficiently large X1.X2, z £ ({0, l}*) 3 : 

Pr [Cert* <- A ; Judgement(C'er£*) = id^-i*] < n(n) 

The above holds from the security of the signature scheme. Since Judgement 
never outputs the identity of P2 and may just output the identity of Pi, the 
only interesting case is when the adversary controls P2 and succeeds in creating 
a forged certificate Cert* for which Judgement( Cert*) = id\ . Since Pi is honest, 
it follows the protocol specifications and creates all the circuits correctly, consis- 
tent and open the commitments correctly. Remember also that every signature 
the honest Pi produces contains meta-information about the message (such as 
identity of the participating parties, protocol unique identifier, message identifier 
etc.) to ensure that a corrupted P| cannot mix and match signatures obtained 
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during different protocols to create a forged certificate. Therefore, if the adver- 
sary produces a certificate that passes the verification, it must have forged one 
of the messages. A more formal argument appears in the full version. ® 

Acknowledgements. The authors would like to thank Yehuda Lindell for 
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Abstract. In 0 ], the authors presented a unified framework for con- 
structing Universally Composable (UC) secure computation protocols, 
assuming only enhanced trapdoor permutations. In this work, we weaken 
the hardness assumption underlying the unified framework to only the 
existence of a stand-alone secure semi-honest Oblivious Transfer (OT) 
protocol. The new framwork directly implies new and improved UC fea- 
sibility results from only the existence of a semi-honest OT protocol in 
various models. Since in many models, the existence of UC-OT implies 
the existence of a semi-honest OT protocol. 

Furthermore, we show that by relying on a more fine-grained anal- 
ysis of the unified framework, we obtain concurrently secure computa- 
tion protocols with super-polynomial-time simulation (SPS), based on 
the necessary assumption of the existence of a semi-honest OT protocol 
that can be simulated in super-polynomial times. When the underlying 
OT protocol has constant rounds, the SPS secure protocols constructed 
also have constant rounds. This yields the first construction of constant- 
round secure computation protocols that satisfy a meaningful notions of 
concurrent security (i.e., SPS security) based on tight assumptions. 

A notable corollary following from our new unifed framwork is that stand- 
alone (or bounded-concurrent) password authenticated key-exchange pro- 
tocols (PAKE) can be constructed from only semi-honest OT protocols; 
combined with the result of 0] that the existence of PAKE protocols im- 
plies that of OT, we derive a tight characterization of PAKE protocols. 

1 Introduction 

The notion of secure multi-party computation allows m mutually distrustful par- 
ties to securely compute a functionality f(x) = {fi(x), f m (x)) of their cor- 
responding private inputs x = Xi , .... x m , such that party P t receives the value 
fi(x). Loosely speaking, the security requirements are that the parties learn 
nothing more from the protocol than their prescribed output, and that the out- 
put of each party is distributed according to the prescribed functionality. This 
should hold even in the case that an arbitrary subset of the parties maliciously 
deviates from the protocol. 

Shortly after the notion was proposed, strong results were established for 
secure multi-party computation. Specifically, it was shown that any probabilis- 
tic polynomial-time computable multi-party functionality can be securely com- 
puted, assuming existence of enhanced trapdoor permutations HI . The original 
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setting in which secure multi-party protocols were investigated, however, only 
allowed the execution of a single instance of the protocol at a time; this is the 
so called stand-alone setting. A more realistic setting, is one which allows the 
concurrent execution of protocols. In the concurrent setting , many protocols are 
executed at the same time. This setting presents the new risk of a coordinated 
attack in which an adversary interleaves many different executions of a protocol 
and chooses its messages in each instance based on other partial executions of the 
protocol. The strongest (but also most realistic) setting for concurrent security — 
called Universally Composable (UC) security [a] — considers the execution of an 
unbounded number of concurrent protocols, in an arbitrary, and adversarially 
controlled, network environment. Unfortunately, security in the stand-alone set- 
ting does not imply security in the concurrent setting. In fact, without assuming 
some trusted set-up, the traditional simulation-based notion of concurrent secu- 
rity, and in particular UC security, cannot be achieved in general 0-0. 

To circumvent the broad impossibility results, two distinct veins of research 
can be identified in the literature. 

Trusted Set-Up Models: A first vein of work initiated by Canetti and Fis- 
chlin 0 and Canetti, Lindell, Ostrovsky and Sahai 0 (see also e.g., H3-EJ) 
considers constructions of UC-secure protocol using various trusted set-up 
assumptions, where the parties have limited access to a trusted entity. 

Relaxed Models of Security: Another vein of work considers relaxed models 
of security such as quasi-polynomial simulation UM or input-indistinguish- 
ability jlj]. These works circumvents the use of trusted set-ups, but, only 
provide weak guarantees about the computational advantages gained by an 
adversary in a concurrent execution of the protocol. 

In 0, we provided a general unified framework to construct UC-secure proto- 
cols in both trusted set-up models and relaxed security models. In more detail, 
we showed that for any such model, the construction of UC protocols for re- 
alizing any multi-party functionality reduces to the construction of a so-called 
“UC-puzzle” and a so-called strongly non-malleable witness indistinguishable 
(SAfMWl) argument of knowledge. Intuitively, a “UC-puzzle” is a protocol 
that has the property that no adversary can successfully complete the puzzle 
and also obtain a trapdoor, but there exists a simulator who can generate (cor- 
rectly distributed) puzzles together with trapdoors; and a SAfMWl argument 
ensures that no man-in-the-middle adversary can correlate the witness it uses 
in a proof with the witness in the proof it receive^. They we showed that a 
SAT M.WI argument can be implemented using any non-malleable commitment 
scheme; therefore the task of realizing UC security in any model reduces to the 
task of constructing a “UC-puzzle” in that model, which can be easily achieved 
in almost all previously considered set-up and relaxed security models. Further- 
more, in many models, we showed that the existence of a “UC-puzzle” is also 
necessary; in a sense, the notion of “UC-puzzle” characterizes the “minimal” 
set-up and relaxation of security needed for achieving UC security. 

1 A SAfMWX argument can be viewed as an analogy of non-malleable commitments 
in the context of strongly VVI proofs 0- 
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In this work, we focus on a different dimension: Namely, given the minimal set- 
up and relaxation of security need for UC, what is the “minimal” computational 
assumption additionally needed for constructing UC secure protocols. In 0 ], the 
construction of UC protocols from “UC-puzzles” is based on the existence of en- 
hanced trapdoor-permutations (TDP’s), whereas stand-alone secure multi-party 
computation protocols can be constructed based on the minimal assumption of 
the existence of stand-alone secure semi-honest OT protocols jul Efl] . which 
clearly also is a necessary assumption. This immediately raises the following 
question. 

Can we base U C security on the minimal assumption of the existence of 
a semi-honest OT protocol? 

1.1 Previous Works 

Immediately after the work of 0 , there has been several works trying to address 
this problem in specific models. 

In KRA and CRS model: Dameard. et al. 0 showed that UC security can be 
achieved assuming only semi- honest OT protocols in the key registration (KR), 
and common reference string (CRS), as well as uniform reference string (URS) 
models. Their constructions in the KR, and the more generalized arbitrary KR 
(A-KR), models achieve optimal round complexity, which have the same number 
of rounds as the underlying semi-hoest OT protocol (up to a constant factor). 
However, the round-complexity of their construction in the CRS and URS model 
grows linearly with the number of players in the protocol execution. Further- 
more, their construction in the CRS and URS model only implements an ideal 
functionality T in a single session, meaning every execution of their protocol 
needs to invoke the CRS functionality to obtain an independently sampled ref- 
erence string. In contrast, previous constructions of UC secure protocol in the 
CRS model directly implements the multi session extension of IF 0,0 so that 
different protocol executions may share the same CRS. 

In the ^coin-toss hybrid model: In the context of characterizing functionalities 
that are complete for achiveing UC security, Maji, Prabhakaran and Rosulek 
0 showed that the ideal two-party coin-tossing functionality F com _ toss is “com- 
plete” , in the sense that, assuming the existence of semi- honest OT protocols, 
practically all functionalities can be UC-securely realized when players have 
access to the -Uoin-toss functionality, with the same round complexity as the OT 
protocol. 

In the tamper-proof hardware model: Goyal et. al. showed that in the model 
where players can generate and exchange tamper-proof hardware tokens, UC 
security can be achived assuming the weaker assumption of one-way functions 
or even unconditionally, in a constant-number of rounds. 

The above mentioned previous works try to weaken the assumptions that UC 
security is based on using different techniques and exploiting different features 

2 More precisely, all well- formed functionalities can be UC-securely realized. 
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of the specific models under consideration. This immediately raises the question 
whether we can achieve UG security from semi-honest OT protocol in a generic 
way as in jjj , independent of the specifics of different set-up or relaxed security 
models. 

Can we base UC security only on the existence of semi-honest OT pro- 
tocols, generically ? 

Furthermore, can we achieve so with optimal round complexity? 

Such a generic construction would not only help us identify and undersand the 
key elements needed for achieving UC security, also allow us to obtain new UC- 
feasibility results in other models easily. 

Furthermore, one common limitation of the previous results is that they all 
used the trusted set-ups in a strong way so that different protocol executions 
have different and independent “trapdoors”, which makes UC security relatively 
easy to achieve. Let us explain the intuition. In order to construct a protocol 
secure in the concurrent setting, we need to establish two properties: Concur- 
rent simulation, that is, the simulator can simulate messages from the honest 
players in many concurrent sessions for the adversary, and concurrent simulation- 
soundness, that is the adversary even when receiving simulated messages cannot 
break the security of the protocol against honest players. The concurrent simu- 
lation property can be established easily as long as there is a single trapdoor (or 
correlated trapdoors) shared by all protocol executions; the simulator can simply 
use that trapdoor to simulate. The concurrent simulation-soundness property, 
on the other hand, is much harder to establish, and often involves the use of 
non-malleable primitives to ensure independence of different protocol executions 
as in num . However, in the case where different sessions have independent 
trapdoors, concurrent simulation-soundness can be obtained “for free”, as re- 
ceiving simulated messages (containing information of one trapdoor) does not 
help the adversary obtain other trapdoors; hence, the security of the protocol 
w.r.t. the honest players remains. 

Indeed, all previous works use the trusted set-up to generate independent 
trapdoors for different protocol executions. In the CRS (resp. URS) model, 0 
constructed protocols that implement a general functionality T in a single ses- 
sion, meaning that each executions of their protocol invokes the CRS (resp. 
URS) functionality independently, which yields independent trapdoors (that is, 
independent secrets associated with different CRS’s (resp. URS’s)). In the KR 
and A-KR model of uni , every player is registered with a valid public key that 
has a corresponding secret key; furthermore, the secret key of any honest player 
is hidden even if the adversary obtains the secret keys of all other players. Nat- 
urally, the secret keys of players are used as independent trapdoors. The same 
happens in the tamper-proof hardware token model, where the freshly generated 
hardware tokens in each session yield independent trapdoors for different ses- 
sion. Finally, in the ideal coin-tossing hybrid model, the F com _ toss functionality 
is used to sample an independent URS in every session. 

However, in many “weaker” models, there is only a single trapdoor (or corre- 
lated trapdoors) across many protocol executions. Then the techniques used in 
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previous works no longer apply, and the protocol construction needs to explicitly 
“inject” independence to establish simulation soundness. Such set-up models in- 
clude the CRS model, when the protocol construction directly implements the 
multi-session extension of functionalities, the single imperfect string (sunspot) 
model Q , the timing model 0 and the bounded concurrency model j2^| . Fur- 
thermore, the super-polynomial time simultion model also share the same flavor: 
Though each protocol execution session may generate its own trapdoor (for in- 
stance, the pre-image of a randomly sampled image of a one-way function), 
receiving information of the trapdoor in one session, obtained via the super- 
polynomial time power of the simulator, does facilitate the adversary breaking 
the trapdoor in other sessions, as the adversary may create correlation between 
trapdoors in different sessions. Naturally, the question left open by previous 
works is, 

Can we construct UC secure protocols when there are only correlated 

trapdoors, based on the existence of semi-honest OT protocols? 


1.2 Our Results 

In this work, we answer both questions above affirmatively. We improve upon the 
result in [lj to obtain a new unified framework for constructing UC secure pro- 
tocols, assuming only the existence of semi-honest OT protocols. More precisely, 
the main theorem that we establish is: 

Theorem 1 (Unified Framework from OT, Informal). Assume the exis- 
tence of a ti(-)-round UC-secure puzzle £ using some set-up T , and the exis- 
tence of a t 2 {-)-round stand-alone secure semi-honest oblivious-transfer protocol. 
Then, for every m-ary functionality f , there exists a 0(ti(-) + 12(-)) -round pro- 
tocol II — using the same set-up T — that UC-realizes the multi-session extension 

off- 

We remark that since our main theorm is general and only requires the security 
model to admit a single UC-puzzle, the unified framework we provide encom- 
passes both models where there are only correlated trapdoors, as detailed below. 
Trusted Set-up Models: As shown in [ 3 ], many trusted set-up models admit 
constant-round UC-puzzles assuming the existence of one-way functions. Thus, 
our unified framework immediately yields UC feasibility results from only semi- 
honest OT, in a wide range of set-up models. 

Corollary 1 (Trusted Set-up Models). Assume the existence of a t(-) -round 
stand-alone secure semi-honest oblivious-transfer protocol. Then, for every m- 
ary functionality f, there exists a 0{t{-))-round protocol II that UC-realizes the 
multi-session extension of f in the following models: 

— Tamper proof hardware model j2n/. 

— Key registration (KR) model \jl. 
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— Chosen common reference string (C-CRS) model H/, any common reference 
string (A-CRS) model M> and uniform reference string (URS) model 0/, 

— Timing model Ml 

— Multi-string model Ml 

— Single imperfect string (sun-spot) model 0 / (assuming additionally the ex- 
istence of collision resistence hash functions). 


We compare our results with previous works. In the tamper-proof hardware 
model (line 1), our feasibility result is weaker than that of 0, which achieved 
UC unconditionally. In the key-registration models (line 2) , we re-prove the result 
in 0 In the CRS and URS models (line 3), we obtain new feasibility results 
that implement directly the multi-session extension of functionalities, instead of 
implementing only in single session as in 0 ; furthermore, we improve the round 
complexity to that of the OT protocol, whereas in 0 the round-complexity 
grows linearly with the number of players in the protoccol execution. In the rest 
of set-up models (line 4 to 6) that only admit correlated trapdoors, we obtain 
new UC feasilibity results from only semi-honest OT. 

Optimal Round-Complexity: We remark that round- complexity of our construc- 
tion depends solely on and is at the same order as that of the underlying semi-honest 
OT protocol. Therefore, assuming the existence of a constant-round semi-honest 
OT protocol, we obtain constant-round UC secure protocols in all above mentioned 
models. 

Sufficient and Necessary Assumption for UC Security: Our main theorem shows 
that t- round semi-honest OT protocols are sufficient for UC security in various 
models. In fact, it is also necessary in many models. As shown in 0,0], Ground 
UC secure computation in the key registration, CRS and URS models (line 1 
and 2) implies f-round semi-honest OT; since the single-CRS, and single-URS 
models are strictly weaker than their one-CRS-per-session and one-URS-per- 
session versions, the implication also holds in these two models. It is easy to 
see that the same is true in the timing model. Therefore, our result yields a 
tight characterization of the feasibility of t-round UC secure computation (from 
l2(f)-round semi-honest OT) in the key-registration, CRS, URS, single-CRS, 
single-URS and timing models. 

Super-Polynomial Time Simulation Model. In a super-polynomial time 
simulation model with simulation time T — T can be, say, quasi-polynomial time 
(QPT) or sub-exponential time (subEXP) — assuming the existence of a one-way 
function that is hard to invert in polynomial time, but easy to invert (with prob- 
ability 1) in T time, there exists a one-message UC-puzzle in T-time simulation 
modefl. Note that when considering subEXP time simulation, the assumption of 


3 The UC puzzle simply consists of one message from the sender is the image of a ran- 
dom string through that one-way function. It is hard for polynomial time adversary 
to break the puzzle (i.e., obtain a pre-image), but easy for a T-time simulator. 
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one-way functions invertable in subEXP time is simply implied by the existence 
of any one-way function^. Therefore, applying our main theoreu0, we have: 

Corollary 2 (Super-Polynomial Time Simulation Models). Assume the 
existence of a t(-)-round stand-alone secure semi-honest oblivious-transfer pro- 
tocols secure for subEXP-time. Then, for every m-ary functionality f , there 
exists a 0(t(-)) -round protocol II that realizes f with subEXP-time-simulation 
security. Furthermore, the real and ideal executions are indistinguishable to all 
subEXP-time distinguishers. 

This result weakens the assumptions that SPS secure protocols can be relied on: 
Previous constructions either requires strong complexity assumptions HE IH or 
the existence of enhanced trapdoor permutations secure against super-polynom- 
ial time 0 ]. 

Moreover, Our subEXP-secure protocols have optimal round-complexity. The 
construction relies on the existence of semi-honest OT protocols that are secure 
for subEXP time (i.e., semi- honest OT protocol that are simulatable by subEXP- 
time simulator and the simulation is indistinguishable to the real execution to 
subEXP-time distinguishers). This assumption is in fact necessary, in order to 
achieve the strong security guarantees provided by our unified framework: Pro- 
tocols constructed through our unified framework admits simulation (i.e., the 
ideal world execution) that are indistinguishable from the real execution not 
only to all polynomial time distinguishers, but also to distinguishers with the 
same running time as the simulator; we call this strong SPS-security. 
Constant-Round SPS Security from Poly-Time Secure OT. As discussed above, 
strong SPS security necessarily relies on super-polynomial time hard OT proto- 
col. We show that, in fact, the use of super-polynomial time hardness assumption 
can be circumvented, when considering a weaker notion of security called plain 
SPS-security , where the simulator may take super-polynomial time, but the sim- 
ulation produced is only indistinguishable w.r.t. polynomial time. (In fact, this 
is the security guarantee achieved in the first two positive results of SPS security 
in Em , although they still requried super-polynomial time hardness assump- 
tions.) Given a semi- honest OT protocol that is simulatable in subEXP-time 
but only indistinguishable to WT distinguishers — call it a subEXP-simulatable 
semi-honest OT protocol — we have, 

Theorem 2 (Plain SPS-Security from Polynomial-Time OT). Assume 
the existence of a t(-)-round stand-alone secure subEXP-simulatable semi-honest 

4 Every one-way function can be inverted in exponential time using brute force. There- 
fore, by appropriately scale down the security parameter, we obtain one-way func- 
tions that can be inverted in sub-exponential time. 

5 The informal statement of our unified framework in Theorem Q does not explicitly 
specify the complexity of the simulator and distinguisher, nor their relationship with 
the hardness of the OT in the assumption. More precisely, our unified framework 
holds for arbitrary classes Cam of simulators and distinguishers, assuming an OT 
protocol that is secure for C S im . See Section E| for a formal treatment of the security 
definition and statement of our unified framework in Theorem El 
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oblivious-transfer protocol. Then, for every m-ary functionality f, there exists 
a O(t(-f)-round protocol II that realizes f with plain subEXP-time-simulation 
security. 

Recently, Canetti, Lin, and Pass in showed how to achieve plain SPS-security, 
assuming only enhanced trapdoor permutations; however, their construction 
requires polynomially many communication rounds, whereas our construction 
yields constant-round protocols assuming that the underlying OT protocol has 
constant rounds. In concurrent and independent work, Garg, Goyal, Jain and 
Sahai 0 , also present a construction of constant-round SPS secure protocols; 
but they additionally assume the existence of collision resistant hash functions 
besides from that of semi-honest OT0 Finally, we remark that our assumption is 
again tight: secure protocols with plain subEXP-time-simulation security imply 
OT protocols that can be simulated using subEXP time. 

Password-Key Exchange from OT. As another application of our unified 
framework, we consider another line of relaxation — bounded concurrency — that 
is, in the concurrent execution of protocols, there is a priori bound on the total 
number of sessions that may coexist at any time point. This line of relaxation 
has been previously considered in several works i 0 0 0; they showed 
how to construct bounded-concurrent secure computation using non black-box 
techniques, based on the existence of collision resistant hash functions. We show 
that in fact, the model of bounded concurrency can be cast as a special case of our 
generalized model of UC security, by considering a restricted class of environment 
that respects the bound m2 on the total number of concurrent executions, and 
additionally only exchanges a bounded number mi of messages with the the 
adversary. We call this the (mi, m2)-bounded concurrency model. Therefore, by 
constructing a 0(mi +7712) UC-puzzle in this model, we immediately obtain the 
following feasibility result. 

Corollary 3 (Bounded Concurrency Model ). Let m and m' be any poly- 
nomial. Assume the existence of constant-round stand-alone secure semi-honest 
oblivious-transfer protocol. Then, for every m-ary functionality f, there exists a 
0 (mi + mf)-round protocol II that securely realizes f in the (mi, m2) -bounded 
concurrency model. 

Lindell 0 showed that 0 (m ) communication rounds are necessary for security 
in the (m, 0)-bounded concurrency model, when relying on black-box simulation 
techniques; therefore, our construction achieves the optimal round- complexity. 
Furthermore, it is shown in 0 that the existence of t-round two-party compu- 
tation protocols in the (2, 0)-bounded concurrency model implies the existence of 
t Password- Authenticated Key-Exchange (PAKE) protocols. Therefore, we ob- 
tain 0(t)-round PAKE protocols from any t-round semi-honest OT. Combined 
with the result of Nguyen jij that t-round PAKE implies 0(t)-round OT, this 

6 Their proof techniques, however, are significantly different, and it would seem that 
an advantage of their approach is that they not rely on non-uniform reductions to 
an as large extent as we do. 
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resolves the complexity of PAKE protocols. Previous constructions of PARE pro- 
tocols assume stronger assumptions, namely, the existence of enhanced trapdoor 
permutations and collision resistant hash functions m. Another related work 
due to Goyal, Jain and Ostrovsky (2f| considered a weaker notion of securitjQ, 
and constructed PAKE protocols satisfying the weaker notion in the unbounded 
concurrent setting based on collision resistant hash functions. 

1.3 Outline 

We refer the reader to 0] for a formal definition of the generalized model of 
UC-security, and notions of UC-puzzle and SNMWX protocols. In Section |21 
provide an overview of our techniques. In Section E2 we present our main result 
that general UC security can be based on sh-OT protocols, and provide a proof 
sketch. The remaining results and formal proofs will appear in the full version. 

2 Techniques 

2.1 The LPV Approach 

By relying on previous results Hi El IE Eli the construction of a UC secure 
protocol for realizing any multi-party functionality reduces to the task of realiz- 
ing the “ideal Zero-Knowledge functionality” , which amounts to constructing a 
zero-knowledge protocol that is both concurrently simulatable and concurrently 
simulation- extractable — namely, we can concurrently extract a witness from ev- 
ery convincing proof given by the adversary, even if it receives multiple concurrent 
simulated proofs. The “simulation” part is usually easy to achieve; as shown in 0] , 
it suffices to provide the simulator a single “trapdoor” . This is formalized by the 
notion of a UC-puzzle in (jj, which, intuitively, is a protocol that has the property 
that no adversary can successfully complete the puzzle and also obtain a trapdoor, 
but there is a simulator who can generate puzzle transcripts (distributed statis- 
tically close to real transcripts) together with trapdoors; the former is called the 
soundness property and the latter called the statistical simulation property. How- 
ever, obtaining “simulation-soundness” it significantly harder. In jlj], the authors 
achieve this in two steps: First construct a “special-purpose” zero-knowledge pro- 
tocol that is concurrently simulation- sound — namely, even if an adversary receives 
multiple concurrent simulated proofs, it can not prove any false statements; then, 
strengthen security to get simulation-extractability. 

The first step relies on a primitive called strong non-malleable witness-indist- 
inguishable {SN M.WT) arguments, which captures the non-malleability property 
w.r.t. strongly witness indistinguishable proofs. Informally, a SNMWI argument 
ensures that no man-in-the-middle adversary can correlate the witness it uses in a 

7 More precisely, the security notion of is defined through the simulation paradigm 
where the simulator may rewind the trusted functionality, for instance, the ideal 
PAKE functionality, for a limited number of times, whereas we achieve full security 
without rewinding. On the other hand, their protocols are secure in unbounded 
concurrent setting, however, ours are only secure in bounded concurrent setting. 
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proof with the witness in the proof it receives. It is shown in pj that SMMWI ar- 
guments can be constructed from non-malleable commitments. At a high-level, the 
simulation-sound protocol follows the Feige-Shamir paradigm, in which the veri- 
fier first sends a UC-puzzle to establish a “trapdoor” (that is, the puzzle answer), 
and then the prover proves that either the statement is true or it knows a trap- 
door, using a SAfMWl argument^. In essence, the UC-puzzle enables concurrent 
simulation: A simulator can simulate the puzzle executions with the verifier to ob- 
tain corresponding answers, and then use them as trapdoors to successfully simu- 
late the SAfMWl arguments. On the other hand, the SNMWI property ensures 
simulation-soundness: Even if the adversary receives SNM.WZ proofs using the 
trapdoors as “fake witnesses” , the adversary does not do the same. 

The second step in pj enhances the security by employing the compilation 
technique of (23, 3.7S, |23] ■ which transforms a concurrently simulation-sound pro- 
tocol into one that is concurrently simulation-extractable, using enhanced trap- 
door permutations (TDP). 


2.2 UC-Security from Semi-honest OT 

In this work, we weaken the assumption that UC security relies on, by provid- 
ing a new compilation technique for transforming a simulation-sound protocol 
into a simulation-extractable one, relying only on stand-alone semi-honest obliv- 
ious transfer (sh-OT) protocols. Our compilation technique uses similar ideas as 
that in (2jl, |23| that achieves extractability using OT; furthermore, interestingly, 
though our compilation technique is non-black-box, it is inspired by the black- 
box compilation technique used in min for transforming a sh-OT protocol 
into one secure against malicious adversaries (m-OT protocol). At a very high- 
level, we use the idea of having an OT execution with two random inputs at the 
prover’s side (acting as the sender) and fixed input index 1 at the verifier’s side 
(acting as the receiver), and later letting the prover use the second random input 
to hide the witness. This idea leads to a simple protocol as, even if the verifier 
deviates from the honest behavior in the OT execution, it learns no information 
of the witness; therefore, it suffices to require the verifier to prove of its honest 
behavior after the OT execution (instead of giving a proof after every message 
in the OT execution as the standard technique requires). Next we explain our 
compilation technique in more details. 

First, it follows from standard techniques that the existence of a sh-OT proto- 
cols implies the existence of a full-fledged OT protocol against malicious adver- 
saries (m-OT for short). Then given a simulation-sound ZK (ssZK) protocol, our 
compilation technique outputs a protocol (P, V) as follows: In the first stage, the 
prover and the receiver participates in an execution of a m-OT protocol where 
the prover acts as the OT sender using two random inputs n and r 2 and the 
verifier acts as the OT receiver choosing the first input; in the second stage, the 

8 The actually protocol is more complicated, as the notion of SAfMWZ arguments are 
only defined with respect to languages with unique witness. But for an intuitive expla- 
nation of high-level ideas here, we omit the complication. 
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verifier proves that it has used input index 1 in the OT execution using the ssZK 
protocol; if the proof is accepting, the prover then sends the witness w padded 
with the second random input w © r 2 in the third stage, followed by a proof in 
the fourth stage that this message XOR’ed with the second sender’s input in 
the OT execution is indeed a valid witness of the statement being proved using 
again the ssZK protocol. The high level idea of the protocol (P, V) is simple. 
First of all, it is concurrently simulatable: To simulate a proof of statement x, 
a simulator can send a random string in the third stage in place of w ® r 2 and 
“cheats” in the proof in the last stage by relying on the concurrent-simulation 
property of the ssZK protocol; (it acts honestly in the first two stages). To see 
that {P,V) is further concurrently simulation-extractable, consider a man-in- 
the-middle adversary that receives many proofs, referred to as the left proofs, 
and gives many proofs, referred to as the right proofs, concurrently. We construct 
a simulator-extractor (which eventually corresponds to the simulator of our UC 
secure protocols) that concurrently simulates all the left proofs as described 
above and extracts a witness from every convincing right-proof as follows: In a 
right proof, the simulator-extractor (acting as the verifier) chooses the second 
input in the OT execution and “cheats” in the proof in the second stage rely- 
ing again on the concurrent simulation property of the ssZK protocol; it then 
recovers the witness by simply XORing the third stage message with the second 
input it obtains in the OT execution. To show that simulator-extractor always 
extracts valid witnesses from the adversary, it boils down to show that the ad- 
versary is never able to prove a false statement using the ssZK protocol, even 
amid simulation, which essentially relies on the simulation-soundness property 
of the ssZK protocol. 

However, some subtleties arise: The simulator-extractor simulates for the ad- 
versary both proofs of the ssZK protocol and OT executions. The simulation- 
soundness property only guarantees that the adversary cannot cheat when re- 
ceiving simulated proofs of the ssZK protocols, but not simulated OT executions. 
(This problem is in the same spirit as the problems encountered in (111-01 when 
using non- malleable commitments as a sub-protocol in a larger protocol.) To solve 
this problem, we enhance the security of our ssZK protocol so that it is also 
simulation-sound w.r.t. the OT protocol — namely, even when the adversary re- 
ceives many simulated executions of the OT protocol, it still cannot prove any false 
statement. In fact, we will design a protocol that is simulation-sound both w.r.t. 
itself and to any protocols with a fixed bounded number of rounds; this is achieved 
by relying on a notion of fc-robust SNMWI protocol, which is a SJ\fMWX pro- 
tocol that additionally guarantees that no adversary can correlates the witness 
it uses in a proof with the “secret” in a fc-round interaction it participates in, 
provided that messages in that interaction are indistinguishable (when generated 
with different secrets). This notion is in analogy to the notion of fc-robust non- 
malleable commitments 0 ; and as we show, can be realized using a fc-robust 
non-malleable commitment scheme. Then since as shown in jl3 |. fc-robust non- 
malleable commitment can be constructed from the minimal assumption of OWF, 
so can fc-robust SJ\fA4Wl protocols. Finally, we remark that this problem of 
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robustness is not present in [jj ; there, the compilation technique of smei only 
implicitly requires the ssZK protocol to be simulation-sound w.r .t. non-interactive 
protocols, which is satisfied by any ssZK protocol that is an argument of knowl- 
edge (as required by the compilation technique). 

An additional issue that we encounter is that for the above argument to go 
through, we need the OT protocol to satisfy some additional properties. More 
precisely, recall that the proof of concurrently simulatability of (P, V) requires 
showing that as long as the adversary can prove that it has acted honestly in 
the OT execution with input 1, the sender’s second random input is completely 
hidden. At a first glance, it seems that this follows directly from the security 
against malicious receiver of the OT protocol. However, it may be possible for 
a malicious receiver to obtain the second input in the OT execution, but later 
explain its behavior with input 1. Fortunately, the security property that we need 
is exactly captured by the notion of defensible privacy for the receiver introduced 
by HH , which, roughly speaking, ensures that as long as a malicious receiver can 
output a good “defense” — that is, explaining its behavior as an honest receiver 
with input b and random tape cr — at the end of the OT execution, then the 
honest sender’s other input b' ^ b must remain hidden. Furthermore, to show 
that (P, V) is simulation extractable, we need the OT protocol to satisfy that 
as long as a malicious sender can output a good “defense” , with inputs r \ , r% 
and random tape o', after an OT execution, the honest receiver with input 
b must obtain rv To formalize this security property, we adapt the notion of 
defensible privacy of to consider the correctness requirement; we called it 
the defensible correctness property. Therefore, our compilation technique relies 
on a m-OT protocol that is defensibly private for the receiver and defensibly 
correct for the sender; we show that such a protocol is implied by the existence 
of sh-OT protocols. 

Constant-round SPS-security from polynomial-time hard sh-OT: In Q], the 
authors constructed SPS-secure protocols with strong indistinguishability: Real- 
world executions of these protocols are indistinguishable to ideal-world simula- 
tions, against distinguishers of the same time complexity of the simulator, which 
is super-polynomial. To obtain a model of security that can be implemented 
in constant rounds from standard polynomial time harness assumptions (in the 
plain model), we weaken the generalized model of UC security in [jj to require 
only plain indistinguishability against VVT distinguishers. However, even with 
this weakening, at the first glance, it is still unclear how to achieve plain-SPS- 
security from only polynomial time hardness assumptions. Let us illustrate the 
difficulty using the above described protocol ( P , V) that implements the ideal 
ZK functionality. 

In order to simulate the view of and extract witnesses from a man-in-the- middle 
adversary, the simulator-extractor of (P, V) simulates all the ssZK proofs to the 
adversary, as well as all the OT executions it participates in. The latter can be 
simulated efficiently, but, the concurrent simulation of the ssZK arguments takes 
super-polynomial time in the SPS-model. Then it seems that in order to apply the 
security guarantees of the sh-OT protocol and the simulation soundness property 
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of the ssZK protocol (to show that the view of the adversary is indistinguishable 
and it never proves any false statement), we need the security of the sh-OT and 
ssZK protocols to hold against super-polynomial machines, (since the adversary, 
though a VVT machine itself, receives many simulated proofs generated in super- 
polynomial time) . Roughly, this is the technical reason why the LP V protocol relies 
on super-poly hardness assumption. 

To get around this problem, we exploit the structure of the ssZK protocol 
constructed in 0]. Recall that it consists of a UC-puzzle execution where the 
verifier establishes a trapdoor, followed by a proof using the SNMWX argument 
that either the statement is true or a trapdoor is known. The key observation 
is that when simulating a proof of this protocol, only the simulation of the UC 
puzzle takes super-polynomial time; once a trapdoor is obtained, the rest of the 
simulation can be done efficiently. Therefore, if we modify the protocol (P, V) 
to have the puzzle executions in the two ssZK proofs sent at the beginning of 
the protocol — call it the preamble phase of the protocol — we obtain a protocol 
(P',V') that has the same property: Only the preamble phase of the protocol 
takes super-polynomial time to simulate (the rest can be simulated efficiently 
given the puzzle answers). With this simple change, now we only need the sh-OT 
and SNMWX protocols to be secure for polynomial-time. To illustrate our idea, 
consider first the stand-alone setting. To show that (P',V') is zero-knowledge, 
we rely on the “hiding” property of the sh-OT and the SNMWX protocols; 
since the simulation of the preamble phase happens before them, and thus the 
puzzle answers can be fixed non-uniformly, it suffices to rely on “hiding” against 
non-uniform VVT machines. 

We use the same idea to prove the concurrent security of (P 1 , V'): Establish 
the simulation-extractability property of (P ' , V') in a sequence of hybrids that 
gradually simulate each session in two steps (the preamble phase first and then 
the rest) in a clever order. More precisely, consider a man-in-the-middle adver- 
sary that participates in to proofs; order all the proofs according to the sequence 
in which their preamble phases completes. Then consider a sequence of 2 to + 1 
hybrids H 0 ,. . . , # 2 m+i ! s, where in hybrid Hu the first i sessions are simulated, 
and in hybrids H 2 i+i and i? 2 (i+i) (i n addition to the first i sessions) the pream- 
ble phase and the rest of the ( i + l) th session are simulated respectively. To 
show that ( P V') is simulation-extractable, it boils down to prove that every 
two subsequent hybrids are indistinguishable and the adversary never proves a 
false statement using the SNMWX argument in all hybrids. From hybrid H 2i 
to i? 2 i+i this follows directly from the statistical simulation property of the UC- 
puzzles. From hybrid H 2i+ \ to _ff 2 ( i+i)> this relies on the security of the sh-OT 
and SNMWX protocol executions in the (i + l) th session; since in these two 
hybrids, only puzzles in the first i + 1 sessions are simulated, which happens 
before the OT and SNMWX executions in the (i + l) th session and can be fixed 
non-uniformly, we only need the security of the OT and SNMWX protocols to 
hold against non-uniform VVT machines. Given that SNMWX arguments are 
implied by sh-OT protocols, ( P V') implements the ideal ZK functionality with 
plain-SPS-security based on only polynomial-time hard sh-OT protocols. 


712 R. Pass, H. Lin, and M. Venkitasubramaniam 


Now, it seems that by simply combining ( P V') with previous constructions 
of UC secure protocols 77 that uses the ideal ZK functionality IdealZK Hi El 
we can obtain constant-round plain-SPSsecure computation from sh-OT 
protocols. Unfortunately, previous constructions rely on the existence of sh-OT 
protocols; if composing them with (P', V ') in the straightforward way — replacing 
every IdealZK call in 77 with an invocation of {P', V') — for the composed protocol 
II' = 77 ldealZK / ’ v > to be secure in general, we need 77 to be secure against 
super-poly time, which requires super-poly hard sh-OT! To address this, we 
modify the composed protocol II' as we did to the protocol (P, V ) : Consider a 
protocol 77" that is identical to 77' except that all the puzzle-executions in the 
invocations of (P ' , V') are executed in parallel at the beginning of the protocol, 
call this again the preamble phase of the protocol; now 77" has the property 
that only its preamble phase takes super-polynomial time to simulate, and the 
rest can be simulated efficiently with puzzle answers. Therefore, by considering 
a similar sequence of hybrids as in the proof of (P' , V’), we prove the security 
of 77". 

UC-security with bounded concurrency: Let (mi, m 2 )-bounded concurrency de- 
note a scenario where the UC environment communicates in at most mi rounds 
with the adversary, and when at most m 2 executions of some protocol take place. 
It follows from our unified framework that to construct secure computation pro- 
tocols in this model, the key is to construct a UC-puzzle. Towards this, let us 
first examine a simple case where during the execution of any session, the total 
number of messages the adversary receives that do not belong to any UC-puzzle 
is bounded by a fixed number m. (These messages include ones from the envi- 
ronment and ones belonging to the non-puzzle part of the other sessions.) In this 
case, we can design the UC-puzzle as follows: The puzzle receiver sends the im- 
age f(r) of a random value r through a OWF, followed by m+1 witness hiding 
proof of knowledge (POK) of r; the answer to this puzzle is simply a pre-image 
of /(r). It follows from the one-wayness of / and the witness- hiding property of 
the proofs that no adversary (acting as a puzzle receiver) can complete a puzzle 
and obtain an answer. But, there is a puzzle simulator that can simulate many 
concurrent puzzle executions with an adversary (acting as the puzzle sender) and 
extract an answer immediately after each accepting puzzle: The simulator emu- 
lates the puzzle receivers for the adversary honestly, and rewinds the adversary 
at one of the POK’s to extract an answer. Since in this simple case, there are 
more, m+1, POK’s than the number m of other non-puzzle messages, there must 
be one POK from which the simulator can rewind to extract an answer without 
needing to simulate any non-puzzle messages (messages belong to a puzzle can be 
simulated trivially by following the honest receiver strategy in the rewindings). 
Finally, we show that, in fact, this simple case always holds. As we will see later, 
secure computation protocols produced by our framework contains a constant 
number c of non-puzzle messages if the underlying sh-OT protocol has constant 
rounds. Therefore, in the (mi , m 2 )-bounded concurrent model, the total number 
of non-puzzle messages is bounded by mi + cm 2 , yielding 0(mi + m 2 )-round 
bounded concurrent secure computation protocols. 
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3 UC Security Based on Stand-Alone Semi- ho nest OT 

We consider the (C env , C S i m )-UC-model introduced in OJ. The model extends the 
framework of universal composability 0. The key difference from UC lies in 
that in UC, the environment is modeled as a non-uniform VVT machine and 
the ideal- model adversary (or simulator) as a uniform VVT machines, whereas 
in the general model, the environment and the simulator are allowed to be from 
arbitrary complexity classes C env and C s \ m respectively. (Note, however, that the 
adversary is still uniform VVT.) One important affect of this change is that the 
UC composition theorem || no longer holds; as a result, the stand-alone security 
of a protocol does not directly imply the concurrent security. In remedy, in the 
general model, an environment executing a protocol 7r can start many instances 
of the protocol, and thus implementing a functionality T in the general model 
means directly implementing the multi-session extension J0of T. We focus only 
on static adversaries. Let cl{C em ,C s \ m ) represent the closure of C em and C S im that 
includes all computations by VVT oracle Turing machines M with oracle access 
to C en v,C S i m . In this section, we prove the following main technical theorem. 

Theorem 3. Assume the existence of a tp-round {C env ,C sim ) -secure UC-puzzle 
in a Q -hybrid model, and a tor -round stand-alone sh-OT protocol secure w.r.t 
cl(C sim ,C env ). Then, for every “well-formed” functionality T , there exists a Oft p+ 
tor)-round protocol II in the Q-hybrid model that (C em ,C s \ m )-UC-realizes T . 


3 . 1 Proof of Theorem |3| 

Recall that the IdealZK functionality parameterized with a language L imple- 
ments the function ZK L {{x, w),x) = (_L, 6), where b = 1 if w is a valid witness 
for the membership of x in L and 0 otherwise. Then Theorem 0 follows from the 
following two lemmas. 

Lemma 1 (IdealZK-Lemma). Assume the existence oft-round stand-alone se- 
cure sh-OT secure w.r.t cl{C em ,C s \ m ). Then, for every well-formed functionality 
T , there exists a 0(t)-round protocol II in the ZK-Hybrid model that (C erw , C S | m )- 
UC-realizes T . 

Lemma 2 (Puzzle-Lemma). Let II' be a protocol in the IdealZK model. As- 
sume the existence of a (C env ,C sim ) -secure tp-round puzzle ( S,R ) in a Q-hybrid 
model, a tor-round stand-alone sh-OT protocol {Sot, Rot) that is secure w.r.t 
cl{C s \ m ,C em ), and a twi-round tor-robust SAfMWI protocol (P s . U s ) secure 
w.r.t cl (C s j m , C env ) ■ Then, there exists a 0(tp + twi + tor)-round protocol II 
in the Q-hybrid that {C env ,C s \ m )-UC emulates II' . 

The first lemma is implicit in previous works i S E 0 for normal UC- 
security (i.e., (n.u .VVT, VVT)- UC-security) and can be easily extended to the 

9 Informally speaking, F emulates many independent copies of T running concur- 
rently; see 0, |3| for a formal definition. 
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general (C e nv,Csim)-UC model assumingstand-alone sh-OT protocol secure w.r.t 
cl(C s \ m ,C env ); we onl *t the proof (see 0 for a similar proof assuming TDP’s). 
Next, towards proving the puzzle lemma, we provide a general transformation 
that transforms any protocol II in the ZK-Hybrid model into a protocol U' in the 
real model using a special-purpose zero-knowledge protocol that is “concurrently 
simulatable” and “concurrently simulation-extractable” . 

Special-purpose ZK Protocol ( P , V) . The construction of (P, V) relies on the fol- 
lowing three building blocks; all with security against class d(C env , C S im)- (1) A t'- 
round m-OT protocol (Sot, Rot) that is defensibly private for the receiver and 
defensibly correct for the sender; it follows from standard techniques UM that 
such a protocol exists assuming tor-round sh-OT protocols, and tl = O(tor)- 
(2) A i'-robust SNMWX protocol (P S ,V S )\ it follows from a similar proof as 
in 0| that such a protocol exists assuming OWF’s and the round-complexity is 
of 0(t'). (We defer the formal construction and proof of such m-OT and robust 
SNMWZ to the full version.) (3) A (C env ,C sim )-secure puzzle ((S,R),1Z) in a 
Q hybrid model. For simplicity of exposition, our description below rely on a 
statistically binding commitment scheme com that has unique decommitment , 
that is the transcript of the commitment not only uniquely decides the value 
committed to inside but also the decommitment with overwhelming probability; 
but the protocol can be easily modified to work with any arbitrary statistically 
binding commitment (see the full version for more details). Then, the special- 
purpose ZK protocol ( P , V) for a NP relation R L proceeds as follows: To prove 
a statement x, the prover and verifier with identities idp and idy , and additional 
auxiliary input w = Rl(x) for the prover, interacts in six stages. 

Stage 1: The Prover and Verifier participate in a puzzle- interaction where the 
Verifier assumes the role of the sender and the Prover as the receiver. Let 
TRANSy_>p be the transcript of the messages exchanged. 

Stage 2: The Prover and Verifier participate in a second puzzle- inter action with 
the roles reversed, i.e. the Prover is the sender and the Verifier is the receiver. 
Let TRANSp_>y be the transcript of the messages exchanged. 

Stage 3: The Prover first selects two random string n,r 2 G {0, 1}". Then the 
Prover and Verifier interact using (Sot, Rot), where the Prover is the sender 
with inputs (ri, r%) and the Verifier is the receiver with input 1. Let tranSot 
be the transcript of the messages exchanged. 

Stage 4: The Verifier commits to s using com. Then it proves using the protocol 
(P S ,V S ) and identity idy, the statement that it either committed to a string 
s that contains a valid witness establishing the verifiers input as index 1 in 
transot and the string output by the receiver at the end of the Stage 3 
protocol or a string s such that (s, TRANSp_*.y) G 7 Z. 

Stage 5: The Prover sends the string r = r% ® w in the clear. 

Stage 6: The Prover commits to s’ using com. Then the prover proves using the 
protocol (P S ,V S ) and identity idp, the statement that it either committed to 
a string s' that establishes that the inputs used by the prover in transot is 
(ri , r2) such that r 2 ®r G Rl( x) or a string s' such that (s', TRANSy_yp) G 7 Z. 
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Realizing the IdealZK-functionality: Given any protocol II' in ZK-Hybrid model 
and the special-purpose zero-knowledge protocol (P, V) , the protocol II in the 
real model is constructed from II' by instantiating the IdealZK functionality 
using ( P , V). All invocations of the IdealZK functionality in which P, provers to 
Pj a statement x using witness w is replaced with an subroutine call of ( P , V) 
between I\ and Pj where Pj proves the statement x using witness w to Pj , using 
identities idp = i and idy = j respectively. The formal security proof that 77 
emulates II' in the ZK-Hybrid will appear in the full version. 
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Abstract. The GLV method of Gallant, Lambert and Van- 
stone (CRYPTO 2001) computes any multiple kP of a point P 
of prime order n lying on an elliptic curve with a low-degree 
endomorphism $ (called GLV curve) over F p as kP = kiP + 
k 2 <P(P), with max{|fci|, |fc 2 |} < Ciy/n, for some explicit constant Ci > 0. 
Recently, Galbraith, Lin and Scott (EUROCRYPT 2009) extended this 
method to all curves over F p 2 which are twists of curves defined over 
F p . We show in this work how to merge the two approaches in or- 
der to get, for twists of any GLV curve over F p 2, a four-dimensional 
decomposition together with fast endomorphisms V, 'P over F p 2 act- 
ing on the group generated by a point P of prime order n, result- 
ing in a proven decomposition for any scalar k E [1, n] given by 
kP = kiP + k 2 <P{P) + k 3 P(P) + k 4 'P<P(P) with maxi(|fci|) < C 2 n 1/4 , 
for some explicit C 2 > 0. Remarkably, taking the best Ci,C 2 , we obtain 
C 2 /Ci < 412, independently of the curve, ensuring in theory an almost 
constant relative speedup. In practice, our experiments reveal that the 
use of the merged GLV-GLS approach supports a scalar multiplication 
that runs up to 50% times faster than the original GLV method. We 
then improve this performance even further by exploiting the Twisted 
Edwards model and show that curves originally slower may become ex- 
tremely efficient on this model. In addition, we analyze the performance 
of the method on a multicore setting and describe how to efficiently 
protect GLV-based scalar multiplication against several side-channel at- 
tacks. Our implementations improve the state-of-the-art performance of 
point multiplication for a variety of scenarios including side-channel pro- 
tected and unprotected cases with sequential and multicore execution. 

Keywords: Elliptic curves, GLV-GLS method, scalar multiplication, 
Twisted Edwards curve, side-channel protection, multicore computation. 

1 Introduction 

The Gallant-Lambert-Vanstone (GLV) method is a generic approach to speed 
up the computation of scalar multiplication on some elliptic curves defined over 
fields of large prime characteristic. Given a curve with a point P of prime or- 
der n , it consists essentially in an algorithm that finds a decomposition of an 
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arbitrary scalar multiplication kP for k G [l,n] into two scalar multiplications, 
with the new scalars having only about half the bitlength of the original scalar. 
This immediately enables the elimination of half the doublings by employing the 
Straus-Shamir trick for simultaneous point multiplication. 

Whereas the original GLV method as defined in m works on curves over F p 
with an endomorphism of small degree (GLV curves), Galbraith-Lin-Scott (GLS) 
in 0 have shown that over F p 2 one can expect to find many more such curves by 
basically exploiting the action of the Frobenius endomorphism. One can there- 
fore expect that on the particular GLV curves, this new insight will lead to 
improvements over F p 2 . Indeed the GLS article itself considers four-dimensional 
decompositions on GLV curves with nontrivial automorphisms (corresponding 
to the degree one cases) but leaves the other cases open to investigation. 

In this work, we generalize the GLS method to all GLV curves by exploiting 
fast endomorphisms ( I>, P over F p 2 acting on a cyclic group generated by a point P 
of prime order n to construct a proven decomposition with no heuristics involved 
for any scalar k 6 [1, n] 

kP = k\P + k 2 $(P) + k 3 <P(P) + k A P${P) with max(|fcit) < Cn 1/4 

for some explicitly computable C. In doing this we provide a reduction algorithm 
for the four-dimensional relevant lattice which rims in 0(log 2 n) by implementing 
two Cor nacchia- type algorithms jtil'721 . one in Z, the other in Z[iJ. The algorithm 
is remarkably simple to implement and allows us to demonstrate an improved 
C = 0(y/~s) (compared to the value obtained with LLL which is only Q(s 3 / 2 )). 
Thus, it guarantees a relative speedup independent of the curve when moving 
from a two-dimensional to a four-dimensional GLV method over the same un- 
derlying field. If parallel computation is available then the computation of kP 
can possibly be implemented (close to) four times faster in this case. When mov- 
ing from two-dimensional GLV over F p to the four-dimensional case over F p 2 , 
our method still guarantees a relative speedup that is gwasi-uniform among all 
GLV curves (see Section Q for details). In fact, we present experimental results 
on different GLV curves that demonstrate that the relative speedup between 
the original GLV method and the proposed method (termed GLV-GLS in the 
remainder) is as high as 50%. 

Twisted Edwards curves |2j are efficient generalizations of Edwards curves [3 , 
which exhibit high-performance arithmetic. By exploiting this curve model, Gal- 
braith, Lin and Scott showed in P that the GLS method can be improved in 
practice a further 10%, approximately (see also j 1 91 1 8) i . They also described 
how to write down j-invariant 0 and 1728 curves in Edwards form to combine 
a 4-dimensional decomposition with the fast arithmetic provided by this curve 
model. We exploit this approach and, most remarkably, lift the restriction to 
those special curves and show that in practice the GLV-GLS curves discussed in 
this work may achieve extremely high-performance and become virtually equiv- 
alent in terms of speed when written in Twisted Edwards form. 

In the last years multiple works have incrementally shown the impact of using 
the GLS method for high performance [8119113) . However, it is still unclear how 
well the method behaves on settings where side-channel attacks are a threat. Since 
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it is usually assumed that required countermeasures once in place degrade perfor- 
mance significantly, it is also unclear if the GLS method would retain its current 
superiority in the case of side-channel protected implementations. Here, we study 
this open problem and describe how to protect implementations based on the GLV- 
GLS method against timing attacks, cache attacks and similar ones and still achieve 
very high performance. The techniques discussed naturally apply to GLV-based 
implementations in general. Finally, we discuss different strategies to implement 
GLV-based scalar multiplication on modern multicore processors, and include the 
case in which countermeasures against side-channel attacks are required. 

The presented implementations corresponding to the GLV-GLS method 
improve the state-of-the-art performance of point multiplication for all the cases 
under study: protected and unprotected versions with sequential and parallel 
execution. For instance, on one core of an Intel Core i7-2600 processor and at 
roughly 128 bits of security, we compute an unprotected scalar multiplication in 
only 91,000 cycles (which is 1.34 times faster than a previous result reported by 
Hu, Longa and Xu in im and a side-channel protected scalar multiplication 
in only 137,000 cycles (which is 1.42 times faster than the protected implemen- 
tation presented by Bernstein et al. in |3J). 

Related Work. Recently, a paper by Zhou, Hu, Xu and Song PHI has shown 
that it is possible to combine the GLV and GLS approaches by introducing a 
three-dimensional version of the GLV method, which seems to work to a certain 
degree, with however no justification but through practical implementations. 
The first author together with Hu and Xu PH! studied the case of curves with 
j-invariant 0 and provided a bound for this particular case. Our analysis supple- 
ments PHI by considering all GLV curves and providing a unified treatment. 

2 The GLV Method 

In this section we briefly summarize the GLV method following (23 . Let E be 
an elliptic curve defined over a finite field F (; and P be a point on this curve 
with prime order n such that the cofactor h = #E(¥ q )/n is small, say h < 4. 
Let us consider $ a non trivial endomorphism defined over F, ; and X 2 + rX + s 
its characteristic polynomial. In all the examples r and s are actually small 
fixed integers and q is varying in some family. By hypothesis there is only one 
subgroup of order n in E(¥ q ), implying that <P(P) = A P for some A 6 [0, n — 1], 
since r I>(P) has order dividing the prime n. In particular, A is obtained as a root 
of X 2 + rX + s modulo n. 

Define the group homomorphism (the GLV reduction map) 
f: Z x Z — > Z/n 

(i, j) !->■ i + Xj (mod n) . 

Let K = ker f. It is a sublattice of Z x Z of rank 2 since the quotient is finite. Let 
k > 0 be a constant (depending on the curve) such that we can find vi, v? two 
linearly independent vectors of /C satisfying max{|i>i| , | v- 2 | } < k y/n, where |-| 
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denotes the rectangle noniQ. Express (ft 0) = B\V\ + /3 2 v 2 , where ft £ Q. Then 
round ft to the nearest integer ft = [ft] = [ft + 1/2J and let v = b\V\ + 6 2 u 2 . 
Note that v £ 1C and that u = f (k, 0 )— v is short. Indeed by the triangle inequality 
we have that 

If we set (ft, ft) = u, then we get k = ft + ft A (mod n ) or equivalently kP = 
k\P + ft<£(P), with max(|ft|, |ft|) < k y/n. 

In ESI, the optimal value of k (with respect to large values of n, i.e. large 
fields, keeping X 1 2 + rX + s constant) is determined. Let A = r 2 — 4s be the 
discriminant of the characteristic polynomial of ( I>. Then the optimal k is given 
by the following result0 . 

Theorem 1 ( j'ifil Theorem 4]). Assuming n is the norm of an element of 
Z[«P], then the optimal value of k is 

if r is odd, 
ifr is even. 


3 The GLS Improvement 

In 2009, Galbraith, Lin and Scott jH] realised that we do not need to have 
<P ' 2 + rd> + s = 0 in End(E) but only in a subgroup of E(¥) for a specific finite 
field F. In particular, considering = Frob p the p-Frobenius endomorphism of 
a curve E defined over F p , we know that [ P rn (P) = P for all P £ E(¥ p m). While 
this tells nothing useful if m — 1,2, it does offer new nontrivial relations for 
higher degree extensions. The case m = 4 is particularly useful here. 

In this case if P £ E(¥ p i)\E(¥ p 2 ) then P' 2 (P) = —P and hence on the 
subgroup generated by P, & satisfies the equation X 2 + 1 = 0. This implies 
that if 'P(P) is a multiple of P (which happens as soon as the order n of P is 
sufficiently large, say at least 2 p), we can apply the GLV approach and split again 
a scalar multiplication as kP = ft P + ft'Z'(P), with max(|ft|, | fc 2 1 ) = O(ftn). 
Contrast this with the characteristic polynomial of which is X 2 — a p X + p 
for some integer a p , a non-constant polynomial to which we cannot apply as 
efficiently the GLV paradigm. 

For efficiency reasons however one does not work with E/¥ p t directly but 
with E '/¥ p 2 isomorphic to E over F p 4 but not over F p 2 , that is, a quadratic 

1 The rectangle norm of ( x,y ) is by definition max(|ir|, |y|). As remarked in ESI, we 
can replace it by any other metric norm. We will use the term “short" to denote 
smallness in the rectangle norm. 

2 There is a mistake in ESI in the derivation of k for odd values of r. This affects ESI 
Corollary 1] for curves +2 and E 3 , where the correct values of k are respectively 2/3 
and 4\/2/7 . 
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twist over F p 2 . In this case, it is possible that #E'( F p 2 ) = n > (p L) 2 be 
prime. Furthermore, if ip: E’ — >• E is an isomorphism defined over F p 4, then the 
endomorphism = ipFrobpip -1 £ End (A') satisfies the equation X 2 + 1 = 0 
and if p = 5 (mod 8) it can be defined over ¥ p . 

This idea is at the heart of the GLS approach, but it only works for curves 
over F p m with rn > 1, therefore it does not generalise the original GLV method 
but rather complements it. 

4 Combining GLV and GLS 

Let E/¥ p be a GLV curve. As in Section 0 we will denote by E '/¥ p 2 a quadratic 
twist F p 4-isomorphic to E via the isomorphism ip: E E’ . We also suppose 
that #E'(¥ p 2 ) = nh where n is prime and h < 4. We then have the two 
endomorphisms of E', X P = ip¥iob p ip~ l and ( 1> = ipcpip -1 , with <p the GLV 
endomorphism coming with the definition of a GLV curve. They are both de- 
fined over Fp 2 , since if a is the nontrivial Galois automorphism of F p 4 /F p 2 , then 
ip a = —xp, so that = ip a Frobp(V’ _1 ) <T = (— ip) Frob p (— ip -1 ) = P, mean- 
ing that X F £ Endj,’ 2 (E'). Similarly for 4>, where we are using the fact that 
4> £ End[K p (E). Notice that P 2 + 1 = 0 and that P has the same characteristic 
polynomial as <p. Furthermore, since we have a large subgroup ( P ) C E' (F p 2 ) 
of prime order, P(P) = A P and P(P) = l^P for some A, p, £ [1 , n — 1]. We 
will assume that and 'T, when viewed as algebraic integers, generate disjoint 
quadratic extensions of Q. In particular, we are not dealing with Example 1 from 
Appendix 0 but this can be treated separately with a quartic twist as described 
in Appendix B of the full version of this article EH- 

Consider the biquadratic (Galois of degree 4, with Galois group Z/2 x Z/2) 
number field K = Q(+, E). Let Ok be its ring of integers. The following analysis 
is inspired by [23 Section 8] . 

We have Z[+, +] C o^. Since the degrees of A and are much smaller when 
compared to n , the prime n is unramified in K and the existence of A and p 
above means that n splits in Q( f F) and Q ('£'), namely that n splits completely 
in K. There exists therefore a prime ideal n of o k dividing uok, such that its 
norm is n. We can also suppose that ( P = A (mod n) and X P = p (mod n). The 
four-dimensional GLV-GLS method works as follows. 

Consider the GLV-GLS reduction map F defined by 
F : Z A -*Z/n 

(xi,X 2 ,xs,X 4 ,) x\ + X 2 X + X 3 P + x^Xp (mod n) . 

If we can find four linearly independent vectors +,..., U 4 £ ker F, with 
maxj |uj| < Cn 1 / 4 for some constant C > 0, then for any k £ [1, n — 1 ] we 
write 4 

(fc, 0,0,0) = 2&«i , 

J=1 

with /3j £ Q. As in the GLV method one sets v = Y^j=A.Pj\ v j an d 
u = (k, 0, 0, 0) — v = (ki, k 2 , ki, Aq) . 
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We then get 

kP = k x P + k 2 F(P) + k 3 F(P) + k 4 FF(P) with max(|fc,|) < 2C'n 1 / 3 4 . (1) 

We focus next on the study of ker F in order to find a reduced basis v\ , t> 2 , t's , tq 
with an explicit C. We can factor the GLV-GLS map F as 

Z In 

(x\,X 2 , x 3 , 2 : 4 ) 1 — > x\ + X 2 & + X 3 F + x±FF\ — >-a;i + a^A + x^n + aqA/z (mod n). 

Notice that the kernel of the second map (reduction mod n n Z[0, 0]) is exactly 
n fl Z[0, 0]. This can be seen as follows. The reduction map factors as 

Z[0,0] — > o K — » o K /n = Z /n 

where the first arrow is inclusion, the second is reduction mod n, corresponding 
to reducing the aq’s mod n fl Z = nZ and using F = A, F = p (mod n). But the 
kernel of this map consists precisely of elements of Z[F,F] which are in n, and 
that is what we want. 

Moreover, since the reduction map is surjective, we obtain an isomorphism 
Z[F,F]/nnZ[F,F] = TLjn which says that the index of nnZ[<£, F] inside Z[<P, F] is 
n. Since the first map / is an isomorphism, we get that ker F = / _ 1 (nflZ[^, F]) 
and that ker F has index [Z 4 : ker F] = n inside Z 4 . 

We can also produce a basis of ker F by the following observation. Let F' = 
F — A, F' = F — n, hence F'F' = FF — A F — /iF + A/r. In matrix form, 

1 \ / 1 0 0 0 \ / 1 

F' I [ — A 1 0 0 0 

F' _ -p 0 10 0 

F'F') \\n -fi-XlJ \FF 

Since the determinant of the square matrix is 1, we deduce that Z [F,F] = 
Z[F', F'\. But in this new basis, we claim that 

n D Z [F',F'] =nZ + Z F' + Z0' + Z F'F' . 

Indeed, reverse inclusion Q) is easy since F' , F' . F'F' £ n and so is n, because n 
divides no k is equivalent ton 3 hok- On the other hand, the index of both sides 
in Z [F ' , F'] is n, which can only happen, once an inclusion is proved, if the two sides 
are equal. Using the isomorphism /, we see that a basis of ker F c Z 4 is therefore 
given by 

wi = ( n , 0, 0, 0), W 2 = (—A, 1, 0, 0), W 3 = (— p , 0, 1, 0), wq = (A/x, — /z, —A, 1) . 

The LLL algorithm H 2 then finds, for a given basis w \ , . . . , W 4 , of ker F, a re- 
ducecH basis v % , . . . , V 4 in polynomial time (in the logarithm of the norm of the 
Wi s) such that (cf. [3 Theorem 2.6.2 p.85]) 

3 The estimates are usually given for the Euclidean norm of the vectors. But it is easy 

to see that the rectangle norm is upper bounded by the Euclidean norm. 
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IlM<8[Z 4 : kerF] = 8 


( 2 ) 


Lemma 1. Let <P and P be as defined at the b 


J4: Z 4 -S- Z 

(X U X 2 ,X 3 ,X 4 )^ J2 b ii,h,is,M X l X 2 X 3 X 4 


be the norm of an element xi+x 2 P+x 3 P+XbPP £ Z[^, W'], where the bi u i 2) i 3) i 4 ’s 
lie in Z. Then, for any nonzero v 6 ker F, one has 


if I > 


n ^ 

( E iwi ) 174 ' 

4 '^ 


(3) 


Proof. For v £ ker F we have Ji(v) = 0 (mod n) and if v 0 we must therefore 
have |N(w)| > n. On the other hand, if we did not have Q , then every component 
of v would be strictly less than the right-hand side and plugging this upper bound 
in the definition of |N(u)| would yield a quantity < n, a contradiction. □ 


Let B be the denominator of the right-hand side of Q , then (0 and 0 imply 
that 

M <8B 3 n^ i= 1,2, 3, 4. (4) 

Remark 1. In our case, where P 2 + 1 = 0 and <P' 2 + r ( P + s = 0, we get as norm 
function 

xf + s 2 x 2 -+- + s 2 x 4 — 2 rx\x 2 — 2rsx\X2 — 2^3X4 — 2 rsx^x\-\- 

(r 2 + 2 s)x 2 x 2 + 2 x\x\ + (r 2 - 2 s)x 2 x 2 + (r 2 - 2 s)x 2 x 2 + 2s 2 z 2 z 2 + (r 2 + 2 s)x 2 x 2 
- 2 rx\x 3 xi - 2 rsx\x 3 Xb — 2 rx\x 2 x\ — 2 rsx\x 2 x\ + 8s:ri2:2iE3iE4 , 


and therefore 

B= (4 + 4s 2 +8s + 8|r|+8|r|s + 2(r 2 +2s) + 2|r 2 -2s|) 1/4 . (5) 

Prom (Q) and (0 we have proved the following theorem. 

Theorem 2. Let E/¥ p be a GLV curve and E'/Wjp a twist, together with the 
two efficient endomorphisms <P and W, where everything is defined as at the start 
of this section. Suppose that the minimal polynomial of <P is X 2 + rX + s = 0. 
Let P £ E' (F p 2 ) be a generator of the large subgroup of prime order n. There 
exists an efficient algorithm, which for any k £ [1, n] finds integers k\,k 2 ,k 3 , kb 
such that 

kP = fciP + k 2 $(P) + k 3 P{P) + k A PP{P) with maxflfcil) < 16B 3 n 1/4 

and 

B = (4 + 4s 2 + 8s + 8|r| + 8|r|s + 2(r 2 + 2s) + 2|r 2 - 2s|) 1/4 . 


Four-Dimensional Gallant-Lambert-Vanstone Scalar Multiplication 725 


4.1 Uniform Improvements 

The previous analysis is only the first step of our work. It shows that the GLV- 
GLS method works as predicted in a four-way decomposition on twists of GLV 
curves over F p 2 . However, the constant B 3 involved is rather large and, hence, 
does not guarantee a non-negligible gain when switching from 2 to 4 dimensions 
(especially on those GLV curves with more complicated endomorphism rings). 
A much deeper argument, which we develop in the full version of this article, 
allows us to prove the following result. 

Theorem 3. When performing an optimal lattice reduction on ker F, it is pos- 
sible to decompose any k £ [l,n] into integers k\, k 2 , £ 3 , Aq such that 

kP = jfei P + k 2 $(P) + k 3 <P(P ) + k^d>{P) , 
with maxj(|fcj|) < 103(^/1 + |r| + s) n 1//4 . 

The significance of this theorem lies in the uniform improvement of the constant 
16H 3 , which is U(,s 3 / 2 ) in Theorem 0 to a value that is an absolute constant 
times greater than the minimal bound for the 2-dimensional GLV method (The- 
orem HJ). Hence, this guarantees in practice a quasi- uniform improvement when 
switching from 2-dimensional to 4-dimensional GLV independently of the curve. 

To prove Theorem 0 first note that Lemma 0 gives a rather poor bound 
when applied to more than one vector, as is done three times for the proof of 
Theorem El A more direct treatment of the reduced vectors of ker F becomes 
necessary, and this is done via a modification of the original GLV approach. This 
results into a new, easy-to-implement lattice reduction algorithm which employs 
two Cornacchia-type algorithms 0 Section 1.5.2], one in Z (as in the original 
GLV method), the other one in Z[i] (Gaussian Cornacchia). The new algorithm 
is presented in Appendix |E1 The main difficulty lies in controlling arguments 
of complex numbers in the Gaussian Cornacchia algorithm and is technically 
rather delicate. This difficulty does not exist in the original GLV algorithm, as 
taking absolute values suffices to get the desired bounds. We refer to the full 
version EH for details. 

Remark 2. In the case of the LLL algorithm, we have not managed to demon- 
strate a bound as good as the one obtained with our lattice reduction algorithm. 

Remark 3. Nguyen and Stehle m have produced an efficient lattice reduction 
in four dimensions which finds successive minima and hence produces a decom- 
position with relatively good bounds. Our algorithm represents a very simple 
and easy-to-implement alternative that may be ideal for certain cryptographic 
libraries. 

5 GLV-GLS Using the Twisted Edwards Model 

The GLV-GLS method can be sped up in practice by writing down GLV-GLS 
curves in the Twisted Edwards model. Note that arithmetic on j-invariant 0 
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Weierstrass curves is already very efficient. However, some GLV curves do not 
exhibit such high-speed arithmetic. In particular, curves in Examples 3-6 from 
Appendix E] have Weierstrass coefficients 0,4 • ae 7^ 0 for curve parameters a .4 
and 06 and hence they have more expensive point doubling (even more if we 
consider the extra multiplication by the twisted parameter u when using the GLS 
method). So the impact of using Twisted Edwards is expected to be especially 
significant for these curves. In fact, if we consider that suitable parameters can 
be always chosen the use of Twisted Edwards curves isomorphic to the original 
Weierstrass GLV-GLS curves uniformizes the performance of all of them. 

Let us illustrate how to produce a Twisted Edwards GLV-GLS curve with the 
GLV curve from Example 4, Appendix 0 First, consider its quadratic twist over 

E '/¥ P 2 : z 3 - yu 2 x - 7 u 3 = (x + 2 u) • ( x 2 - 2ux - 7 -u 2 ) 

The change of variables Xi = x + 2u transforms E' into 

2 o o 9u 

y — Xy — bux ± H — — -x\ . 


Let 0 = 3u/ a/ 2 e F p 2 and substitute x\ = 0x0 to get 

1 9 .0 6m ,9 . 

y* = X /3 --X l2 + x' 


and this is a Montgomery curve Ma,b ■ Bv 2 = u 3 + Au 2 + u, where A 7^ ±2, B ^ 
0, with 


1 = 2^2 
0 3 27m 3 ’ 


A 


6 m 

T 


-2V2 . 


The corresponding Twisted Edwards GLV-GLS curve is then : ax 2 + y 2 = 
1 + dx 2 y 2 with 


A + 2 
B 


= 27m 3 


V 2 



The map E' — >• E a . d is 

(x,y) 


f x + 2 u x + 2 u — 0 \ 
\ 0y ’ x + 2 m + 0 ) 


= {X,Y) 


(V,V)^ 


- 2m + (0 + 2u)Y 1 + Y \ 

1 - Y ’ (1 - Y)X ) ' 


We now specify the formulas for $ and 'A, obtained by composing these endo- 
morphisms on the Weierstrass model with the birational maps above. We found 
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an extremely appealing expression in the case when u = 1 + i and i 2 = — 1. Then 
8 = 3u/\/2 = 3Cs where Cs is a primitive 8th root of unity. We have 

( (Cl + 2Cl + Cs)XY 2 + (Cl - 2Ci + Cs)A (cl - 1)T 2 + 2d - cl + ^ 

( ’ } ~ l W ’ (2C! + C!-l)^ 2 -C! + lJ 

and 

^(x,F) = (c 8 x^,^) . 

In this case 

a = 54 (C| - Cl + 1) , d = —54 (C| + Cf - 1) ■ 

Finally, one would want to use the efficient formulas given in m for the case 
a = — 1. After ensuring that —a be a square in F p 2 , we use the map (x, y) 
(x/\/—a, y) to convert to the isomorphic curve — x 2 + y 2 = 1 + d'x 2 y 2 , where 
d! = —d/a. 

6 Side-Channel Protection and Parallelization of the 
GLV-GLS Method 

Given the potential threat posed by attacks that exploit timing information to 
deduce secret keys ( JlfiUj i. many works have proposed countermeasures to min- 
imize the risks and achieve the so-called constant-time execution during crypto- 
graphic computations. In general, to avoid leakage the execution flow should be 
independent of the secret key. This means that conditional branches and secret- 
dependent table lookup indices should be avoided 1X3- There are five key points 
that are especially vulnerable during the computation of scalar multiplication: 
inversion, modular reduction in field operations, precomputation, scalar recoding 
and double-and-add execution. 

A well-known technique that is secure and easy to implement for inverting 
any field element a consists of computing the exponentiation a p ~ 2 mod p using 
a short addition chain for p — 2. 

To protect field operations, one may exploit conditional move instructions 
typically found on modern x86 and x64 processors (a.k.a. cmove). Since condi- 
tional checks happen during operations such as addition and subtraction as part 
of the reduction step it is standard practice to replace conditional branches with 
the conditional move instruction. Luckily, these conditional branches are highly 
unpredictable and, hence, the substitution above does not only makes the execu- 
tion constant-time but also more efficient in most cases. An exception happens 
when performing modular reduction during a field multiplication or squaring, 
where a final correction step could happen very rarely and hence a conditional 
branch may be more efficient. 

In the case of precomputation, recent work by H3 and later by [3j showed 
how to enable the use of precomputed points by employing constant-time table 
lookups that mask the extraction of points. In our implementations (see Section 
0 ) , we exploit a similar approach based on cmove and conditional vector instruc- 
tions instead, which is expected to achieve higher performance on some platforms 
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than implementations based on logical instructions (see Listing 1 in [El). Note 
that it is straightforward to enable the use of signed-digit representations that 
allow negative points by performing a second table lookup between the point 
selected in the first table lookup and its negated value. 

To protect the scalar recoding and its corresponding double- and- add algo- 
rithm, one needs a regular pattern execution. Based on a method by j23J, Joye 
and Tunstall m proposed a constant-time recoding that supports a regular ex- 
ecution double- and- add algorithm that exploits precomputations. The nonzero 
density of the method is l/(w — 1), where w is the window width. Therefore, 
there is certain loss in performance in comparison with an unprotected version 
with nonzero density l/(w + 1). In GLV-based implementations one has to deal 
with more than one scalar, and these scalars are scanned simultaneously during 
multi-exponentiation. So there are two issues that arise. First, how are the sev- 
eral scalars aligned with respect to their zero and nonzero digit representation?, 
and second, how do we guarantee the same representation length for all scalars 
so that no dummy operations are required? The first issue is inherently solved by 
the recoding algorithm itself. The input is always an odd number, which means 
that, from left to right, one obtains the execution pattern (w — 1) doublings, d 
additions, ( w — 1) doublings, d additions, ... , (to — 1) doublings and d addi- 
tions, for d-dimensional GLV. For dealing with even numbers, one may employ 
the technique described in m in a constant-time fashion, namely, scalars ki that 
are even are replaced by ki + 1 and scalars that are odd are replaced by k t + 2 (the 
correction, also constant-time, is performed after the scalar multiplication com- 
putation using d point additions). Solution to the second issue was also hinted 
by EH- The reader is referred to the full paper version for the modified recoding 
algorithm that outputs a regular pattern representation with fixed length. Note 
that in the case of Twisted Edwards one can alternatively use unified addition 
formulas that also work for doubling (see ('ill ‘1\ for details). However, our anal- 
ysis indicates that this approach is consistently slower because of the high cost 
of these unified formulas in comparison to doubling and the extra cost incurred 
by the increase in constant-time table lookup accesses. 

6.1 Multicore Computation and Its Side-Channel Protection 

Parallelization of scalar multiplication over prime fields is particularly difficult 
on modern multicore processors. This is due to the difficulty to perform point 
operations concurrently when executing the double-and-add algorithm from left 
to right. From right to left parallelization is easier but performance is hurt be- 
cause the use of precomputations is cumbersome. Hence, parallelization should 
be ideally performed at the field arithmetic level. Unfortunately, current multi- 
core processors still impose a severe overhead for thread creation/destruction. 
During our tests we observed overheads of a few thousands of cycles on mod- 
ern 64-bit CPUs (that is, much more costly than a point addition or doubling). 
Given this limitation, for the GLV method it seems the ideal approach (from 
a speed perspective) to let each core manage a separate scalar multiplication 
with ki . This is simple to implement, minimizes thread management overhead 
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and also eases the task of protecting the implementation against side-channel 
attacks since each scalar can be recoded using Algorithm 4 [2D App. E] . Using d 
cores, the total cost of a protected d-dimensional GLV Z-bit scalar multiplication 
(disregarding precomputation) is about l/d doublings and l/((w — 1) ■ d) mixed 
additions. A somewhat slower approach (but more power efficient) would be to 
let one core manage all doublings and let one or two extra cores manage the 
additions corresponding to nonzero digits. For instance, for dimension four and 
three cores the total cost (disregarding precomputation) is about l/d doublings 
and l/((w— l)-d) general additions, always that the latency of (w— 1) doublings 
be equivalent or greater than the addition part (otherwise, the cost is dominated 
by non-mixed additions). 

7 Performance Analysis and Experimental Results 

For our analysis and experiments, we consider the five curves below: two GLV 
curves in Weierstrass form with and without nontrivial automorphisms, their 
corresponding GLV-GLS counterparts and one curve in Twisted Edwards form 
isomorphic to the GLV-GLS curve E 3 (see below). 

- GLV-GLS curve with /-invariant 0 in Weierstrass form E [/¥ p 2 : y 2 = x 3 +9u, 

where pi = 2 127 — 58309 and #E[( F p 2 ) = r, where r is a 254-bit prime. 
We use F pf = F Pl [i\/(i 2 + 1) and u = 1 + i G F p 2 . E[ is the quadratic 
twist of the curve in Example 2, Appendix 0 <P(x, y) = X P = (fx, y) and 
\P(x,y) = yP = (u^ 1 ~ p ^ 3 x p , u < ' 1_p )/ 2 y p ), where £ 3 = 1 modpi. We have 
that <£ 2 + + 1 = 0 and S' 2 + 1 = 0. 

- GLV curve with /-invariant 0 in Weierstrass form E% /F P2 : y 2 = a; 3 +2, where 
p -2 = 2 256 — 11733 and #E 2 (F P2 ) is a 256-bit prime. This curve corresponds 
to Example 2, Appendix 1X1 

- GLV-GLS curve in Weierstrass form E 3 /¥ p 2 : y 2 = x 3 — 15/2 u 2 x — 7u 3 , 
where P 3 = 2 127 — 5997 and #E' 3 (F p 2 ) = 8r, where r is a 251-bit prime. We 
use F p 2 = F P3 [i]/(i 2 + 1) and u = 1 + i € F p 2 . E 3 is the quadratic twist 
of a curve isomorphic to the one in Example 4, Appendix 0 The formula 
for <?(.'£', y) = A P can be easily derived from if(x. y), and Eix, y) = yP = 
(uU-jOajP, u 3 ! 1 -?)/ V). It can be verified that <£ 2 + 2 = 0 and & 2 + 1 = 0. 

- GLV-GLS curve in Twisted Edwards form E^ 3 /F p 2 : —x 2 + y 2 = 

1 + dx 2 y 2 , where d = 170141183460469231731687303715884099728 + 
116829086847165810221872975542241037773i, p 3 = 2 127 - 5997 and 

#E' T 3 (¥ p 2 ) = 8r, where r is a 251-bit prime. We use again F p 2 = F P3 [i]/ ( i 2 + 
1) and u = 1 + i G F p 2 . E ' T3 is isomorphic to curve E 3 above and was ob- 
tained following the procedure in Section 0 The formulas for <I>{x, y) and 
^(x,y) are also given in Section 0 It can be verified that <I > 2 + 2 = 0 and 
& 2 + 1 = 0 . 

- GLV curve E 4 /F P4 : y 2 = x 3 - 15/2 x - 7, where p 4 = 2 256 - 4571 7 and 
#E 4 ( F P4 ) = 2r, where r is a 256-bit prime. This curve is isomorphic to the 
curve in Example 4, Appendix 0 
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Let us first analyze the performance of the GLV-GLS method over F p 2 in com- 
parison with the traditional 2-GLV case over F p . We assume the use of a pseudo- 
Mersenne prime with form p = 2 m — c, for small c (for our targeted curves, groups 
with (near) prime order cannot be constructed using the attractive Mersenne 
prime p = 2 127 — 1). Given that we have a proven ratio G% / C% < 412 that is 
independent of the curve, the only values left that could affect significantly a 
uniform speedup between GLV-GLS and 2-GLV are the quadratic non-residue /3 
used to build F p 2 as F v \i]/ (i 2 — j3), the value of the twisted parameter u and the 
cost of applying the endomorphisms P> and P. In particular, if |#3| > 1 a few extra 
additions (or a multiplication by a small constant) are required per F p 2 multi- 
plication and squaring. Luckily, for all the GLV curves listed in Appendix IXI one 
can always use a suitably chosen modulus p so that |/3| can be one or at least very 
close to it. Similar comments apply to the twisted parameter u. In this case, the 
extra cost (equivalent to a few additions) is added to the cost of point doubling 
always that the curve parameter a in the Weierstrass equation be different to 
zero (e.g., it does not affect /-invariant 0 curves). In the case of Twisted Edwards, 
we applied a better strategy, that is, we eliminated the twisted parameter u in 
the isomorphic curve. The cost of applying P> and P does depend on the chosen 
curve and it could be relatively expensive. If computing P(P), P(P) or PP>(P) 
is more expensive than point addition then its use can be limited to only one 
application (i.e., multiples of those values —if using precomputations— should 
be computed with point additions). Further, the extra cost can be minimized by 
choosing the optimal window width for each fcj. 

To illustrate how the parameters above affect the performance gain we detail 
in Table E estimates for the cost of computing scalar multiplication with our 
representative curves. In the remainder, we use the following notation: M, S, A 
and I represent field multiplication, squaring, addition and inversion over F p , 
respect., and m, s, a and i represent the same operations over F p 2 . Side-channel 
protected multiplication and squaring are denoted by m., and s s . We consider 
the cost of addition, substraction, negation, multiplication by 2 and division by 
2 as equivalent. For the targeted curves in Weierstrass form, a mixed addition 
consists of 8 multiplications, 3 squarings and 7 additions, and a general addition 
consists of 12 multiplications, 4 squarings and 7 additions. For E[ and E- 2 , a 
doubling consists of 3 multiplications, 4 squarings and 7 additions, and for E' 3 
and £4, a doubling consists of 3 multiplications, 6 squarings and 12 additions. 
For Twisted Edwards we consider the use of mixed homogeneous/extended ho- 
mogeneous projective coordinates m In this case, a mixed addition consists of 
7 multiplications and 7 additions, a general addition consists of 8 multiplications 
and 6/7 additions and a doubling consists of 4 multiplications, 3 squarings and 
5 additions. We assume the use of interleaving pm with width-tt’ non-adjacent 
form (w;NAF) and the use of the LM scheme for precomputing points on the 
Weierstrass curves j20| (see also [HU Ch. 3]). 

According to our theoretical estimates, it is expected that the relative speedup 
when moving from 2-GLV to GLV-GLS be as high as 50%, approximately. To con- 
firm our findings, we realized full implementations of the methods. Experimental 
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Table 1 . Operation counts and performance for scalar multiplication at approximately 
128 bits of security. To determine the total costs we consider li=66m, ls=0.76m and 
la=0.18m for E[, E 3 and E' T3 ; and 1I=290M, 1S=0.85M and 1A=0.18M for E 2 and E±. 
The cost ratio of multiplications over F p and F p 2 is M/m=0.91. These values and the 
performance figures (in cycles) were obtained by benchmarking full implementations 
on a single core of a 3.4GHz Intel Core i7-2600 (Sandy Bridge) processor. 



results, also displayed in Table d closely follow our estimates and confirm that 
speedups in practice are about 52%. Most remarkably, the use of the Twisted Ed- 
wards model pushes performance even further. In Tabled the expected gains for 
E ' t 3 are 31% and 97% in comparison with 4-GLV-GLS and 2-GLV in Weierstrass 
form (respect.). In practice, we achieved similar speedups, namely, 33% and 102% 
(respect.). Likewise, a rough analysis indicates that a Twisted Edwards GLV-GLS 
curve for a ^-invariant 0 curve would achieve roughly similar speed to E' T3 , which 
means that in comparison to its corresponding Weierstrass counterpart the gains 
are 9% and 66% (respect.). This highlights the impact of using Twisted Edwards 
especially over those GLV-GLS curves relatively slower in the Weierstrass model. 
Timings were registered on a single core of a 3.4GHz Intel Core i7-2600 (Sandy 
Bridge) processor. 

Let us now focus on curves E [ , E 2 and E' T3 to assess performance of im- 
plementations targeting four scenarios of interest: unprotected and side-channel 
protected versions with sequential and multicore execution. Operation counts for 
computing a scalar multiplication at approximately 128 bits of security for the 
different cases are displayed in Table |3 The techniques to protect and parallelize 
our implementations are described in Sectional In particular, the execution flow 
and memory address access of side-channel protected versions are not secret and 
are fully independent of the scalar. For our versions running on several cores we 
used OpenMP. We use an implementation in which each core is in charge of one 
scalar multiplication with ki. Given the high cost of thread creation/destruction 
this approach guarantees the fastest computation in our case (see Section El for 
a discussion). Note that these multicore figures are only relevant for scenarios 
in which latency rather than throughput is targeted. Finally, we consider the 
cost of constant-time table lookups (denoted by t) given its non-negligible cost 
in protected implementations. 

Focusing on curve E[, it can be noted a significant cost reduction when switch- 
ing from non-GLV to a GLV-GLS implementation. The speedup is more than 
twofold for sequential, unprotected versions. Significant improvements are also 
expected when using multiple cores. A remarkable factor 3 speedup is expected 
when using GLV-GLS on four cores in comparison with a traditional execution 
(listed as non-GLV). 
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Table 2. Operation counts for scalar multiplication at approximately 128 bits of secu- 
rity using curves E'x, E 2 and E' T3 in up to four variants: unprotected and side-channel 
protected implementations with sequential and multicore execution. To determine the 
total costs we consider li=66m, ls=0.76m and la=0.18m for unprotected versions of E[ 
and E ' t 3 ; li=79m., . ls s =0.81m s and la=0.17m s for protected versions of E[ and E ' T 3 ; 
t=0.83m s for E[ (32pts.); t=1.28m s for E' T3 (36pts.); t=0.78m s for E' T3 (20pts.); and 
1I=290M, 1S=0.85M and 1A=0.18M for E 2 . In our case, M/m=0.91 and m«/m=l.ll. 
These values were obtained by benchmarking full implementations on a 3.4GHz Intel 
Core i7-2600 (Sandy Bridge) processor. 



In general for our targeted GLV-GLS curves, the speedup obtained by us- 
ing four cores is in between 42%-80%. Interestingly, the improvement is greater 
for protected implementations since the overhead of using a regular pattern 
execution is minimized when distributing computation among various cores. Re- 
markably, protecting implementations against timing attacks increases cost in 
between 28%-52%, approximately. On the other hand, in comparison with curve 
E 2 , an optimal execution of GLV-GLS on four cores is expected to run 1.81 
faster than an optimal execution of the standard 2-GLV on two cores. 

To confirm our findings we implemented the different versions using curves E [ , 
E 2 and E' t 3 . To achieve maximum performance and ease the task of paralleliz- 
ing and protecting the implementations, we wrote our own standalone software 
without employing any external library. For our experiments we used a 3.4GHz 
Intel Core i7-2600 processor, which contains four cores. The timings in terms 
of clock cycles are displayed in Table 01 As can be seen, closely following our 
analysis GLV-GLS achieves a twofold speedup over a non-GLV implementation 
on a single core. Parallel execution improves performance by up to 76% for side- 
channel protected versions. In comparison with the non-GLV implementation, 
the four-core implementation runs 3 times faster. Our results also confirm the 
lower-than-expected cost of adding side-channel protection. Sequential versions 
lose about 50% in performance whereas parallel versions only lose about 28%. 
The relative speedup when moving from 2-GLV to GLV-GLS on j-invariant 0 
curves is 53%, closely following the theoretical 50% estimated previously. Four- 
core GLV-GLS supports a computation that rims 81% faster than the standard 
2-GLV on two cores. Finally, in practice our Twisted Edwards curve achieves up 
to 9% speedup on the sequential, non-protected scenario in comparison with the 
efficient j-invariant 0 curve based on Jacobian coordinates. 
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Table 3. Point multiplication timings (in clock cycles), 64-bit processor 


Curve 

Method F 


n # Cores 

Core i7 

E ™( F r§) 

4-GLV-GLS, 16pts. 

no 

1 

91,000 

e' T3 (w p1 ) 

4-GLV-GLS, 36pts. 

yes 

1 

137,000 


4-GLV-GLS, 16pts. 

no 

4 

61,000 

E’r,(f ri ) 

4-GLV-GLS, 20pts. 

yes 

4 

78,000 

E'^ pl ) 

4-GLV-GLS, 32pts. 

no 

1 

99,000 


4-GLV-GLS, 36pts. 

yes 

1 

145,000 

M(F p? ) 

4-GLV-GLS, 32pts. 

no 

4 

70,000 

^(F pf ) 

4-GLV-GLS, 36pts. 

yes 

4 

89,000 

E'i (F p 2 ) 

non-GLV, 8pts. 

no 

1 

201,000 

M *») 

2-GLV, 16pts. 

no 

1 

151,000 

M F P2 ) 

2-GLV, 16pts. 

no 

2 

127,000 


Let us now compare our best numbers with recent results in the literature. Fo- 
cusing on one-core unprotected implementations, the first author together with 
Hu and Xu reported in m 122,000 cycles for a ^-invariant 0 Weierstrass curve 
on an Intel Core i7-2600 processor. We report 91,000 cycles with the GLV-GLS 
Twisted Edwards curve E' T3 , improving that number in 34%. We benchmarked 
on the same processor the side-channel protected software recently presented by 
Bernstein et al. in 0, and obtained 194,000 cycles. Thus, our protected imple- 
mentation, which runs in 137,000 cycles, improves that result in 42%. Our result 
is also 12% faster than the recent implementation by Hamburg m Recent im- 
plementations on multiple cores are reported by Taverne et al. in BZ|. However, 
they do not explore the 128-bit security level in their implementations and, hence, 
results are not directly comparable. They also report a protected implementa- 
tion of a binary Edwards curve that runs in 225,000 cycles on a Core i7-2600 
machine, which is 64% slower than our corresponding result. Since the advent of 
the carryless multiplier on recent Intel processors, it has been suspected that the 
only curves able to get performance as good as the GLV-GLS method over large 
prime characteristic fields are Koblitz curves over binary fields. In fact, Aranha 
et al. P very recently presented an implementation of the Koblitz curve K-283 
that runs in 99,000 cycles on an Intel Core i7-2600, which is 9% slower than our 
GLV-GLS Twisted Edwards curve E' T3 (unprotected sequential execution). We 
remark that such performance for a binary elliptic curve can only be attained 
on very recent processors that possess the so-called carryless multiplier. Aranha 
et al. do not report timings for side-channel protected implementations. To the 
best of our knowledge, we have presented the first scalar multiplication imple- 
mentation running on multiple cores that is protected against timing attacks, 
cache attacks and several others. 
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8 Conclusion 

We have shown how to generalize the GLV scalar multiplication method by 
combining it with Galbraith-Lin-Scott’s ideas to perform a proven almost 
fourfold speedup on GLV curves over F p 2 . We have introduced a new and 
easy-to-implement reduction algorithm, consisting in two applications of the 
extended Euclidean algorithm, one in Z and the other in Z[i\. The refined bound 
obtained from this algorithm has allowed us to get a relative improvement 
from 2-GLV to 4-GLV-GLS quasi-independent of the curve. Our analysis 
and experimental results on different GLV curves show that in practice one 
should expect speedups close to 50%. We improve performance even further 
by exploiting the Twisted Edwards model over a larger set of curves and show 
that this approach is especially significant to certain GLV curves with slow 
arithmetic in the Weierstrass model. This makes available to implementers new 
curves that achieve close to optimal performance. Moreover, we have shown 
how to protect GLV-based implementations against certain side-channel attacks 
with relatively low overhead and carried out a performance analysis on modern 
multicore processors. Our implementations of the GLV-GLS method improve 
the state-of-the-art performance of point multiplication for multiple scenarios: 
unprotected and side-channel protected versions with sequential and parallel 
execution. 
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A Examples 

We give a few examples of GLV curves, which are curves defined over C with 
complex multiplication by a quadratic integer of small norm, corresponding to 
an endomorphism (j) of small degrecfl. They make up an exhaustive list, up to 
isomorphism, in increasing order of endomorphism degree up to degree 3. While 
the first four examples appear in the previous literature, the next ones (degree 
3) are new and have been computed with the Stark algorithm |26| . 

Example 1. Let p = 1 (mod 4) be a prime. Define an elliptic curve E over F p 
by 


If j3 is an element of order 4, then the map <p defined in the affine plane by 
(j){x,y) = (~x,py) 

is an endomorphism of E defined over F p with End (£7) = Z[<^] = Z[V— 1], since 
(j) satisfies the equatior@ 

4 ? + 1 = 0 . 


Example 2 

by 


Let p = 1 (mod 3) be a prime. Define an elliptic curve E over F p 
y 2 = x 3 +b . 

If 7 is an element of order 3, then we have an endomorphism <p defined over F p 
by 

<t>{x, y) = (ix, y) , 

^ Z[ 1+ V^ l- 


and End(-E) = 2 


, since <p satisfies the equation 
j>+l=0 . 


4 By small we mean really small, usually less than 5. In particular, for cryptographic 
applications, the degree is much smaller than the field size. 

5 This is the only case when we cannot apply Lemma QJ It needs a separate treatment, 
given in HH, Appendix B. 
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Example 3. Let p > 3 be a prime such that —7 is a quadratic residue modulo p. 
Define an elliptic curve E over F p by 


= x 3 - -x 2 - 2x - 1 . 


If £ = (1 + \/— 7)/2 and a = (£ — 3) / 4, then we get the F p -endomorphism <p 
defined by 


ct)(x,y) = 


f x 2 -£ y(x 2 -2ax + £)\ 

\e(x-ay e (x~a) 2 ) 


and End(-E) = Z[< j>] = Z[ 1+ ^~^ ], since 4> satisfies the equation 


Example 4- Let p > 3 be a prime such that —2 is a quadratic residue modulo p. 
Define an elliptic curve E over F p by 


y 2 = 4ar - 30x - 28 
together with the F p -endomorphism 0 definecQ by 
2x 2 + Ax + 9 2x 2 + 


4>{x,y) = 

We have End (E) = Z\ 


4(x + 2) ,W 4V^2(x + 2)V 

= Z\\f^2\ since (j) satisfies the equation 


0 2 + 2 = 0 . 


Example 5. Let p > 3 be a prime such that —11 is a quadratic residue modulo 
p. We define the elliptic curve E over F p 

2 _ 3 13824 27648 

y ~ x 539 X+ 539 

with a = (1 + yj— ll)/2 and the endomorphism (j) defined by 


(j)(x, y) = 

( 539 , 539 \ 3 . /28 _ 355 2 , /_92 i , 1728 , 192 

V 5184 a 1728 / x ~ t ~V27 a 18/ ^ t l 9 u 3/ X ^ 77 a ~ r ~77 

(Si«-Stb ! + (-W« +§)* + ¥»-! 

( 3773 _ 18865 \ 3 , ( 2695 , _539_\ 2 , _ 91\ , 20 , 1 

V 373248 u 005328 1 J ^ 1 20736 u T 3456/ ^ ~ t432 u 144/ ^ ~ 27 u ' 9 

( 18865 , 116963 \ 3 i 7 7007 _ 539\ 2 i /_ 791 , 581 \ „ , 74 _ 35 

V 1492992 u ' 995328/ ‘ r V 20736 “ 432 1 _r 1 432 “ _r 144 ) x ' 27 “ 9 


such that End(E) = Z[0\ = Z[ 1+%/ 2 1 -]. The characteristic polynomial of (j) is 
(j) 2 - 0 + 3 = 0 . 

6 We take the opportunity to correct a typo found and transmitted in many sources, 
where a y factor was absent in the second coordinate. Its sign is irrelevant. 
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Example 6. Let p > 3 be a prime such that —3 is a quadratic residue mod p. 
We define the elliptic curve E over F p 


V 


3375 

121 X 


6750 


with the endomorphism <j> defined by 


<£(*>») = 


1331a: 3 


10890a; 2 + 81675a: - 189000 1331a: 3 - 16335x 2 + 7425x + 43875' 

33(llx - 45) 2 ,V 3^3(llx - 45) 3 


such thatQ Ihid(-B) = Z[cj)\ = Z[\/— 3]- The characteristic polynomial of (!> is 


f> 2 + 3 = 0 . 


B A New Four-Dimensional Lattice Reduction Algorithm 

Algorithm 1 (Cornacchia’s GCD algorithm in Z) 


Input : n = 1 (mod 4) prime , 1 < p < n such that p 2 = — 1 (mod n) . 
Output: v = + izv(j) Gaussian prime dividing n, such that vP = 0. 


1 . initialize: 

ro<-n, n <- p, r^^n, 
t 0 e- 0, tii-1, t 2 <- 0, 

q<- o. 

2. main loop: 
while > n do 

q <r- [r 0 /rij , 

r 2 <-r 0 - qn , r 0 «- n , n <- r 2 , 
t— to — qti , to t— ti , t\ -k— t 2 . 

3. return: 

= n - iti , = ri , i/(j) = -ti 


7 This is the first example where the endomorphism ring is not the maximal order 
of its field of fractions. It can be summarily seen as follows: End(B) D Z[V— 3]. 
If not equal, then it must be the full ring of integers Z[ 1+ '£~ S ]. This would imply 
that § — 0, as there is only h(— 3) = 1 isomorphism class of elliptic curves with 
complex multiplication by Z[ 1+ ^~^ ], given in Example 2 (see |2E| for an abridged 
description of the theory of complex multiplication). This is clearly not the case 
here. Alternatively, one can see that there would exist a nontrivial automorphism (a 
primitive cube root of unity) corresponding to ~ 1+ 2 v/ ~^ . A direct computation then 
shows this is impossible. 
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Algorithm 2 (Cornacchia’s algorithm in compact form) 


Input: v Gaussian prime dividing n rational prime, 1 < A < n such 
that A 2 + r\ + s = 0 (mod n) . 

Output: Two Z[*]-linearly independent vectors V\ & t> 2 of ker_F 
Z[i] 2 of rectangle norms < 51.5(^/l + |r| + s) n 1 / 4 . 


1 . initialize: 

If A 2 > 2 n then 
ro 4- A, 
else 

r 0 4- A + n, 
r-i 4— v, r 2 4- n, 

Sq 4 — 1 , Si 4 — 0 , S 2 4 — 0 , 

0 . 

2. main loop: 

while |r 2 | 4 (l + |r| + s) 2 > n do 

q 4— closest Gaussian integer to ro/ri, 
r 2 4- r-o - qr\ , ro 4- ri , n 4- r 2 , 
s 2 4— so — , So 4— Si , Si 4— s 2 . 

3. return: 

ri = (r 0 ,-s 0 ), r 2 = (ri,-si) 
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Abstract. Together with masking, shuffling is one of the most fre- 
quently considered solutions to improve the security of small embed- 
ded devices against side-channel attacks. In this paper, we provide a 
comprehensive study of this countermeasure, including improved imple- 
mentations and a careful information theoretic and security analysis of 
its different variants. Our analyses lead to important conclusions as they 
moderate the strong security improvements claimed in previous works. 

They suggest that simplified versions of shuffling (e.g. using random start 
indexes) can be significantly weaker than their counterpart using full per- 
mutations. We further show with an experimental case study that such 
simplified versions can be as easy to attack as unprotected implementa- 
tions. We finally exhibit the existence of “indirect leakages” in shuffled 
implementations that can be exploited due to the different leakage mod- 
els of the different resources used in cryptographic implementations. This 
suggests the design of fully shuffled (and efficient) implementations, were 
both the execution order of the instructions and the physical resources 
used are randomized, as an interesting scope for further research. 

1 Introduction 

Already in the first Differential Power Analysis (DPA) paper, Kocher et al. 
mentioned time randomization as a possible solution to improve security against 
side-channel attacks IE.. Following, different countermeasures have been pro- 
posed to exploit this idea, e.g. relying on the addition of random delays um, 
shuffling the execution order of independent operations flMI26j . or more gener- 
ally, trying to build a non-deterministic processor f4l 1 !)j . As usual in side-channel 
attacks, the main question regarding these solutions is: “to what extent do they 
improve security and at which cost?” . In this paper, we propose a comprehensive 
treatment of this question in the case of the shuffling countermeasure. 

For this purpose, we start with the efficiency issue. In general, shuffling can be 
applied to any set of independent operations. The SubBytes layer of 16 S-boxes 
in the AES Rijndael is a typical example. Randomizing such operations can be 
done in different ways. Taking the extreme cases, either the S-boxes are executed 
according to a Random Permutation (RP) among 16! possible ones, or they are 
executed from a Random Start Index (RSI) among 16 possible ones, that is then 
incremented. This difference is nicely illustrated with previous works on shuffled 
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implementations of the AES. In a first paper from 2006 , the authors use par- 

tial shuffling and first-order masking based on S-box pre-computation. Whereas 
masking is applied to the whole cipher, shuffling is only applied to the first and 
last rounds. Furthermore, the RSI approach is pursued, for performance reasons. 
In a second work from Rivain et al., higher-order masking is implemented and 
shuffling is mainly based on the RP approach m ■ Yet, for the MixColumns 
operations, only the (first-order masked) columns are shuffled, accounting for 
8! possible permutations. That is, thanks to the first-order masking, they have 
8 positions that can be shuffled, vs. 4 if MixColumns was not masked, and 16 
for the other AES transforms. Implementation details are not given in j2S|, but 
we assume that this choice is again motivated by performance reasons, with a 
MixColumn operation implemented with xtime tables nm- Apart from those 
works, shuffling was also applied to hardware implementations with 8- or 32-bit 
datapaths, where RSI is usually preferred as it nearly comes for free |1 11201221 . 

Following this state-of-the art, our first contribution is to improve the perfor- 
mances of software implementations using the RP approach. In this respect, we 
start from the observation that in an unprotected block cipher implementation, 
one usually keeps as much data as possible in the processor registers, in order 
to minimize RAM access. By contrast, once random access to these registers is 
required (as in shuffled implementations), RAM usage is inevitable. This implies 
that any register access becomes a serial of load and store operations, resulting 
in major performance overheads. We mitigate these overheads by exploiting a 
different technique, which consists in manipulating the program flow. It allows 
us to operate on registers while at the same time randomizing the sequence of 
operations. In practice, we present two approaches: the first one changes the 
program flow “on-the-fly” , while the other one re-writes the program memory 
prior to execution. The latter approach can be viewed as an adaptation of the 
self-modifying codes used in software engineering |2j , also applied to counteract 
side-channel attacks in [Q . Our new solutions come with contrasted performance 
results. Namely, the on-the-fly proposal minimizes the overall cycle count, while 
the program memory manipulations allow very efficient online encryption at the 
cost of long (possibly offline) pre-computations. For illustration, we apply these 
proposals (and previously published ones) to the AES Furious implementation 
available from E3- Besides, we also investigate the efficient generation of (small) 
random permutations in low-cost microcontrollers. That is, we take a well known 
optimal algorithm for permutation generation and modify it slightly, in order to 
improve its performances. As a result, we are able to generate close-to- uniform 
permutations, and obtain an efficient alternative to proposals such as jHj- 

Next, we investigate the security of shuffling against side-channel attacks. 
Here, we start from the observation that the existing literature usually evaluates 
the impact of shuffling based on a so-called “integrated DPA” (aka windowing 
attack), introduced in 0 and applied, e.g. in f26l3()j . Intuitively, if the manipula- 
tion of a sensitive variable is spread over t time samples, its correlation with the 
actual leakages will be reduced by a factor yft using such an attack, instead of t 
without integration. Integrating is a convenient tool for evaluation as it can be 
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directly used to estimate the data complexity of a DPA using Mangard’s formu- 
las m Yet, a possible limitation of this technique is that it treats the RSI and 
RP cases in the same way. Hence, a natural question is to determine whether 
these two approaches are indeed equivalent in general, or if advanced evaluation 
tools can be used to put forward additional weaknesses for RSI implementations. 
Our results regarding this question are summarized as follows. 

First, we specialize the information theoretic and security analysis from |28j to 
the context of shuffled implementations. It allows us to confirm that integrated 
DPA is indeed suboptimal compared to a Bayesian exploitation of the leakages. 
While our worst-case evaluations rely on profiled attacks |6l27j , we believe they 
are important to moderate claims of strong security improvements provided by 
shuffling (e.g. the data complexity increases by a factor 360 in 0). In particular, 
these results complement the previous work of Asiacrypt 2010 E9> in which such 
an information theoretic and security analysis was performed for masking. As a 
result and for the first time, we obtain lower bounds for the data complexity of 
standard side-channel attacks against shuffled implementations. 

Second, we notice that security evaluations for masking always combine the 
leakage corresponding to the masked data and its masks, e.g. j'21I24| . Quite 
surprisingly, and to the best of our knowledge, the impact of such a scenario has 
not been investigated in the case of shuffling. Therefore, we include the possibility 
of a leakage on the permutation (or start index) manipulated when shuffling. 
We show that as soon as some information is leaked about them, attacks against 
RSI- and RP-based implementations become significantly different, the RSI case 
being much easier to attack, for computational reasons. 

Finally, we observe that direct leakages about the start index or permutations 
naturally arise in practice and can be exploited. More surprisingly, we also show 
the existence of “indirect leakages” , coming from the different power consump- 
tion models of the hardware resources manipulating the key bytes. For example, 
since the 16 registers used in our shuffled Furious implementations have (slightly) 
different models, marginalizing the distribution of the observed leakage over the 
16 AES key bytes provides information about which S-box is computed. 

Summarizing, we observe that all previous works on shuffling reduced the size 
of the permutation set for some of the operations in the protected algorithm. 
Hence, our results bring the important cautionary note that time complexity is 
critical in the security evaluation of this countermeasure, as permutations with 
a small size can be enumerated which leads to exploitable weaknesses. In this 
respect, an implementation protected with RSI-based shuffling can sometimes be 
as weak as an unprotected one. As for the RP-based solution, we recall that it can 
be used as a noise amplifier for leaking devices, but never as a noise generator. 

2 Efficient Implementations 

This section explores the software design space for shuffling the AES on an Atmel 
ATMega644P microcontroller |3j. We first describe an efficient way to obtain 
close-to- uniform permutations in this device. Next, we show how to obtain an 
AES implementation for which every transform can be shuffled according to 
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such permutations, including MixColumns (and the key scheduling algorithm if 
needed). Afterwards, we describe different implementations: a basic one relying 
on a previously proposed “double indexing” method, and two optimized ones 
relying on randomized execution path and program memory. We finally provide 
precise performance evaluations and a comparison with previous works. 
Permutation Generation. The first building block of a shuffled implemen- 
tation is a permutation generator. From a sequence S := {1, . . . , n}, a uniform 
permutation can be produced in linear time m- The original algorithm iterates 
over every element Si (with i from 0 to n — 2), and swaps it with a random 
element from the remaining tail, i.e. {Si , . . . , S n —i}. However, sampling from 
1} needs either a modulo operation and a random number greater 
than n to start from, or an approach with probabilistic run-time. We avoided this 
performance drawbacks by sampling from {0, . . . , n — 1}. Permuting a sequence 
of 16 entries following this algorithm takes 362 cycles on our device, using 8 bytes 
of randomness. It still allows to generate all permutations, but with a slight bias 
that decreases with the size of the permuted set. To estimate the impact of this 
bias for different sizes of the permutation set N, we systematically sampled 10 8 
permutations generated with this method, and built histograms with IV! bins. 
We then estimated the Euclidean distance between these biased histograms and 
a uniform distribution. In addition, we compared this situation with the one 
obtained with a quite minimum side-channel leakage. Namely, we assumed that 
the Hamming weight of the first entry of a (uniformly generated) permutation 
is known to be the least informative one (i.e. with half of the bits set to one). As 
can be observed in Table 0 the bias due to this small side-channel information 
is already significantly larger than the one due to the permutation generation 
algorithm. Furthermore, actual leakages in Sections 0 and 0 affect all the per- 
mutation entries, which further reduces the bias of the permutation generation 
algorithm compared to the one caused by physical information. Eventually, we 
will show in the next sections that exploiting these biases in a side-channel at- 
tack where we shuffle among 16! possible permutations is computationally hard. 
Therefore, we conclude that our performance optimized algorithm should not 
lead to a significant security reduction of the shuffling countermeasure. 


Table 1 . Bias of the optimized permutation algorithm vs. bias of a small SC A leakage 


N 

3 

4 

5 

6 

7 

8 

9 

Perm, generation 
Small SCA Leak. 

0.04535 

0.28868 

0.03522 

0.20412 

0.02034 

0.07454 

0.00993 

0.03726 

0.00430 

0.01627 

0.00170 

0.00643 

0.00063 

0.00234 


Obtaining Independent Operations. Applying shuffling to an implementa- 
tion requires finding sets of independent operations. In the AES case, sets of 16 
independent operations naturally arise from the AddRoundKey and SubBytes 
transforms. By contrast, the situation is a little bit trickier for ShiftRows and 
MixColumns. For example, implementing ShiftRows requires one extra byte of 
storage in an unprotected implementation, and two in the case of RSI-based shuf- 
fling (i.e. when the permutation is “monotonous” , which restricts the number of 
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permutations to 16). But if 16 independent operations are desired, 16 bytes of 
temporary storage are required. As for MixColumns, four additional registers are 
sufficient if the state is processed column-wise, but this would then account for 4! 
permutations. Hence, having 16 independent operations again requires 16 bytes 
of temporary storage. Since our device has only 32 registers, some of which be- 
ing already occupied, RAM usage becomes inevitable for shuffling. Besides, the 
key schedule has only four independent operations by default. This is because 
within one key schedule round, there are only four S-box executions. Thus, the 
smallest number of indistinguishable operations is four. Yet, applications requir- 
ing on-the-fly key expansion also need an appropriate SPA protection to prevent 
attacks such as |X£| - I * 1 these cases, we interleaved the real key schedule with 
three dummy key schedules, in order to obtain 16 shuffleable operations. 

Basic Implementation with Double Indexing. Direct shuffling requires an 
indirect indexing of the operands. That is, a counter is used to index a permu- 
tation vector, and the result is used to index the operand vector. Thus, instead 
of operating on registers directly, two RAM accesses are required for each (read 
or write) access to operands. This naturally leads to quite large cycle counts, 
as in AVR devices, load and store operations take two cycles (compared to one 
cycle for arithmetic and logic operations). Implementing a fully shuffled AES 
this way results in an execution time of 30 202 cycles, excluding the key sched- 
ule. In the following we propose two different strategies in order to improve on 
these figures. In both cases, instructions are shuffled rather than data location, 
in order to allow register usage. Precisely, we are still limited by the number 
of available registers when performing certain transforms. But contrary to the 
double indexing proposal we do not always access RAM when operating on inter- 
mediate data. The first solution changes the execution path on-the-fly while the 
second actually rewrites the program memory (i.e. assuming that this re-writing 
is pre-computed, this solution can be seen as a simplified one-time program m- 
Optimized Implementation with Randomized Execution Path. For this 
implementation, the assembly code of every (compound of) round transform(s) 
is split into 16 independent blocks of instructions. Each of the 16 blocks is aug- 
mented with a label. This allows us to identify its address in ROM. Furthermore, 
every transform is associated with an array of 17 16-bit words, where the first 
16 words hold the addresses of the 16 blocks, and the 17th holds the address of 
the return instruction. The array content is initialized with the addresses of the 
labels at compile time. Finally, we append a flow-control macro to each of the 
16 blocks. This macro performs three things: fetch an address from the array, 
advance the pointer to the next array entry, and jump to the fetched address. 

During the execution of the cipher, we first re-order the first 16 addresses in 
the array, according to a previously generated permutation. Then, when we enter 
a transform, we set a pointer to the beginning of the array and execute the flow- 
control macro. This causes the execution of the first block and sets a pointer to 
the address of the next block. The flow-control macro is executed 16 times, until 
it finally looks up the address of the return instruction. In practice, we defined 
several sets of transforms and therefore need an address array for each of them. 
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The first one is the compound of AddRoundKey, SubBytes, and ShiftRows. 
This transform reads the state from RAM and stores the result into the register 
file. The next one performs MixColumns and stores the result back to RAM. 
Afterwards, we perform one iteration of the key schedule. Similar to ShiftRows 
and MixColumns, this implies additional memory requirements, because of the 
Rot Word operation. We finally need a standalone AddRoundKey layer for the 
last round. As each of the address arrays need 17x2 bytes, we have an additional 
RAM use of 170 bytes (for technical reasons, the key schedule uses two 17x2- 
byte address arrays). Permuting each of these arrays takes 205 cycles, implying 
an overhead of 1225 cycles. Eventually, for every set of transforms, we need to 
load an address and jump to this address 17 times, each of which takes 6 cycles. 
Together with the preamble to set up the array pointer, it leads to an additional 
overhead of 108 cycles for each of these compounds of transforms. 

Optimized Implementation with Randomized Program Memory. For 

this implementation, we used the self-programming capabilities of the 
ATMega644p microcontrollers. As the shuffling applies to independent opera- 
tions, and as for each operation, the state bytes are always stored in the same 
registers or RAM locations, the execution order of the operations can be per- 
muted by modifying the data corresponding to these locations in program mem- 
ory. In our target controller, the program memory has to be modified one page 
(i.e. 256 bytes) at a time. Hence, the shuffling can be prepared in five steps. 
First, the page is transferred from program memory to the RAM. Afterwards, 
the bytes of code corresponding to state-byte locations are modified according 
to the permutation vector. Then, the previous version of the page is erased from 
program memory, and the new page is loaded into a page buffer. Finally, this 
page buffer is written in program memory. This process is executed before each 
AES execution. The main advantage of this solution is that after pre-processing 
of a shuffled program memory, the execution time of the AES is nearly the same 
as for the unprotected implementation. Minor differences come from the fact 
that we need to have independent operations, which implies to use RAM for 
the storage of some intermediate results. Its main drawback is the long pre- 
computation time, which accounts for approximately 18 milliseconds indepen- 
dently of the clock frequency. This comes from the time-consuming instructions 
used to erase program memory and write page buffer in memory (4.5 millisecond 
per page writen or erased jSj), and the low granularity of these instructions (i.e. 
working at the page level) in the Atmel controllers. More flexible devices (e.g. 
devices with ARM architectures) would allow to improve this limitation. Note 
also that our target Atmel’s EEPROM allows only for 10000 re-write cycles, 
which could possibly lead to DoS attacks. If this is an issue, and depending on 
the actual available ROM, different areas can be used randomly and increase 
the number of possible encryptions by some factor. Again, alternative devices 
could be considered to relax this limitation. For example, the ARM LPC214x 
series allows already for 100000 cycles. Note finally that, as this implementation 
mainly makes sense if pre-processing is allowed, it is naturally executed with a 
pre-computed key scheduling. 
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Implementation Results. The performance results of our implementations 
are compared with previous works in Table El Namely, we use the AES Furious 
as reference. As for protected implementations, we considered the basic one based 
on double indexing and the ones of Herbst et al. and Rivain et al. However, as 
mentioned above, they do not allow direct comparison. Herbst et al. only protect 
the outer rounds (one and ten) with RSI-based shuffling, but implement masking 
for all the rounds and the key schedule. Rivain et al. implement higher-order 
masking and use a “simplified” shuffling for the MixColumns operation (they 
also work on a different 8051-based architecture). The implementation for which 
we give cycle numbers is not masked except for MixColumns. By contrast, our 
implementations use log-table based polynomial multiplication, and are able to 
shuffle all bytes during MixColumns. Not surprisingly, our implementation based 
on double indexing is the slowest. Its performance is comparable to the one of 
Rivain et al. Manipulating the program counter allows us to get a performance 
improvement of almost a factor five and, excluding the key scheduling, leads to 
encryption time only twice as slow as Rijndael Furious. As previously mentioned, 
the larger overheads when executing the key scheduling come from the need to 
execute additional dummy schedulings, in order to keep a permutation among 
16! for this part of the implementation. Finally, the randomized program memory 
allows the fastest online encryption (i.e. excluding program re-writing). 


Table 2. Implementation result comparison 


Implementation 

Clock cycles 

RAM [byte] 

Furious |23 

2 739 

176 

Furious with KS |23| 

3 546 

176 

Herbst et al. [l.'i 

11845 


Rivain et al. |2S| 

29 400 


Dbl. ind. 

30 202 

240 

Dbl. ind. with KS 

46 395 

132 

Rand. exec, path 

6 934 

394 

Rand. exec, path with KS 

14 834 

302 

Rand. prog. mem. 

3299 (+~18 msec) 

480 


3 Evaluation Framework 

We now move to the security analysis of the shuffling countermeasures and its 
variants. For this purpose, we rely on the evaluation framework from m and 
adapt it to capture the specificities of shuffled implementations. In order to have 
a fair understanding of the strengths and weaknesses of the countermeasure, we 
pay a particular attention to worst-case (profiled) attacks. But for completeness, 
we also compare them with the integrated DPA used in previous works. 

Notations. Variables are denoted with capital letters, sampled values with 
lowercase letters and functions with sans serif fonts. We consider the standard 
DPA attacks described in m and illustrate our notations with the case of the 
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AES Rijndael. In this context, the adversary tries to recover a 16-byte master 
key k = {fco, &i, . . . , fc- 15 }. from a leakage corresponding to the first key addition 
and S-box layers. In attacks against unprotected implementations, each S-box is 
executed at a well defined time instant, giving rise to key leakages defined as: 
L 0 Sbox(fco ffi Xo), Lx 4 ^ Sbox(A;i ® Xi), 

L 2 Sbox(fc 2 ® X 2 ) 

That is, we have 16 leakage points (or cycles) L c (where c is the cycle index) and 
16 subkeys k s . If we denote the part of the master key that is manipulated at time 
c with a variable S c , we straightforwardly have S c = c in this unprotected case. 
Note that the variable nature of the leakages comes both from possible noise in 
the measurements and the variable (known) inputs X; . By contrast, in the case 
of a shuffled implementation, the execution order of the S-box computations is 
randomized according to a permutation P, leading to key leakages of the form: 
Lo Sbox(fcp(o) ® Xp(o)), L\ 4 ~ Sbox(fcp(i) ® Xp(i)), 

L 2 ^ Sbox(fcp( 2 ) ® Xp( 2 )) 

That is, we have S c = P(c) with P the secret permutation that is re-generated 
for every new input block, e.g. with the algorithm in Section |2| In this protected 
case, not only leakage about the S-box execution may be obtained, but also 
leakage on the permutation used in the shuffled implementation. In theory, an 
attack could exploit sixteen “direct” permutation leakages denoted as L' c ^ S c . 
Such notations allow us to reflect both the RSI- and RP-based shuffling methods. 
In the first case, we have P(c) = c + t (mod 16), with r < — [0 : 15]. In the second 
case, P is directly picked up among the set of all 16! permutations, i.e. P V\q. 

Information Theoretic Analysis. As a first step in our evaluation, we perform 
an information theoretic analysis that is aimed to capture the worst-case security 
of an implementation. In general, and for a fixed key byte K s , we assume that the 
adversary can observe a leakage vector L = {L 0 , Li, . . . , L 15 }. The goal of this 
evaluation is to obtain an accurate estimation of the mutual informatiorQ: 

MI(X S ; L, X) = H[AT S ] - ^ Pr[AT s = k] ^ Pr[X = as] 

• J Pr[L = l\K a = k, X = x] • log 2 Pr[X s = k\L = 1, X = x] dl. 

In this equation, the term Pr[AT s = k\L = 1, X = x\ is directly obtained from 
Pr[L = 1| A' s = fc, X = x] using Bayes’ theorem. Hence, it is this last conditional 
leakage probability that is most critical to evaluate. For convenience, we will 
ignore the variable X in the rest of the paper, as it is assumed to be known for 
all computations. Next, we will consider two main evaluation scenarios. 

1 As discussed in |22| , this mutual information can only be perfectly estimated when 
the evaluator knows the exact leakage model of his target device. This only happens 
in simulated analyses (e.g. as will be performed in the next section). Whenever a 
practical evaluation is carried out, it is formally a “perceived information” that is 
evaluated, with the goal to be as close as possible to the mutual information. 
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1. No Permutation Leakage, i.e. the adversary gets 16 leakage cycles, and 
each of them could correspond to the target subkey with probability 1/16. 
That is: 

Pr[L = 1| K s = .ft] = Y, ^ p r[£c = h\K a = ft]. 

We will refer to this attack as case (l.a). Besides, and as mentioned in intro- 
duction, a usual trick to attack shuffled implementations is to integrate over the 
leakage cycles. In this case, the adversary defines a variable L = ]T C L c , and 
performs the attack against this variable. It boils down to consider 15 cycles out 
of 16 as “algorithmic noise”. We will refer to this attack as case (1.6). 

2. Leakage on the Permutation. In the same way as all the shares are as- 
sumed to leak in a masked implementation, it is natural to assume that the 
manipulation of a permutation may leak in a shuffled implementation. In prac- 
tice, such leakages usually appear each time the permutation is manipulated in 
the microcode, e.g. when fetching the S c ’th part of the key, or when jumping to 
the Syth piece of code computing an S-box. We now show how to perform an 
information theoretic evaluation in these cases. As previously, the impact of dif- 
ferent implementations of the countermeasure affects the term Pr[L = l\K s = k]. 
For this purpose, we start with the following general formulation: 

Pr[L = 1| K s = k]=Y l P ^c = U\K a = k\, 

with Y the vector of 16 leakages on the previously defined variable S c (indi- 
cating the part of the master key used at time c). The function f essentially 
indicates how the knowledge available about this variable can be exploited by 
the adversary, as witnessed by the five examples that we now describe. 

2. a. Unprotected implementation. In this case, we have f(c, s, 1') = 1 if c = s and 0 
otherwise (i.e. the adversary knows exactly where each key byte is manipulated). 
2.6. Direct template attack. In this case, we just add the permutation leakage 
in the conditional probabilities, yet without making any difference between the 
RSI and RP cases, by computing f(c, s, 1') = Pr[L' c = l' c \S c = s]. Note that the 
case with no permutation leakage corresponds to f(c, s, I') = 1/16. 

2.c. Taking advantage of RSI. Here, the the adversary exploits the fact that only 
16 permutations are possible (out of the 16! ones), which can be enumerated. 
Hence, he can compute: f(c, s, 1') = nl=o Pr [-^i = = (s — c + i) mod 16]. 

Contrary to the RSI case, using a RP implies that the permutation is picked 
up randomly among the 16! ~ 2 44 , which is significantly harder to enumerate. 
Hence, our following experiments will additionally consider two heuristic solu- 
tions that can be used to mitigate this issue and attack more efficiently. 

2 .d. Restricted enumeration against RP. In this case, the function f is identical 
to the exhaustive one, i.e. f(c, s,l') = nl=o Pr [-^i = ^1*% = P(*)L but the 
sum only goes over an enumerable subset of most probable p’s. A beam search is 
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used for this purpose This is a breadth-first search that limits the number of 
nodes (i.e. permutations in the sum) by pruning the least probable ones, which 
is done by weighting permutations p’s with n^=o = K \ = p(*)]- 
2.e. Excluding heuristic. One alternative option to simplify the enumeration is to 
consider that whenever S c = s, we have that c 7 ^ c! implies S c < 7 ^ S c . This can be 
reflected with: f(c, s, 1') = Pr [L' c = l' c \S c = s] • Yl c /^ c (■*■ — ~ = s]), 

which, up to normalization, is equivalent to: 


f(c, s, 1 ') 


Pr[Z/ c = l' c \S c = s] 

1 - Pr [L' c = l' c \S c = s ] 


Overall, an intuition on the security of different implementations is obtained by 
quantifying the number of possible execution orders considered by the adversary 
(which may be more than the actual number of permutations, if attacks do not 
fully exploit their structure). In the unprotected case, only one order can occur. 
For the direct template attack, the adversary does not combine the different S c 
informations and we implicitly have 16 16 possible execution orders. In the RSI 
case, we exploit the fact that only 16 permutations are possible. The attack enu- 
merating all possible permutations lists all 16! hypotheses. Finally, the excluding 
heuristic implicitly allows 16 X 15 15 ones. This situation can be seen as an error 
correcting problem where 16 noisy values are transmitted, that can be integers 
from 0 to 15. The security of the countermeasure relies on a large probability 
of decoding error. In the RSI case, we only have 16 possible codewords, which 
gives us a very resilient code, lowering the probability of errors and thereby the 
strength of the countermeasure. For a RP, we have 16! codewords over a space of 
16 16 possible transmissions, hence increasing the probability of decoding errors. 

As far as performing these attacks/evaluations in practice is concerned, case 
(a) is a classical template attack for which the computational complexity is usu- 
ally neglected. Carrying out attacks/evaluations where L' is exploited naturally 
requires to build additional templates. Yet, the computational complexity of 
cases ( 6 ), (c) and (e) can also be neglected, as they only imply a few additional 
arithmetic operations. In fact, only case ( d ) may require intensive computations, 
if all permutations with non-negligible likelihood (with respect to L') are taken 
into account by the beam search. As will be shown in the next section, increasing 
the noise gradually implies that all permutation candidates have more similar 
likelihoods. Hence, this last attack is only applicable for low noise levels. 


Security Analysis. The second step of our evaluation is to perform a security 
analysis. It allows measuring the extent to which the different strategies listed 
have a strong impact on the data complexity of successful side-channel attacks. 
For this purpose, we apply template attacks with the key selected as: 


k = argmax Pr[IF = F,L ,J = l ,j \K s 


and we compute their success rate, in function of the data complexity q. 
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4 Simulated Experiments 

In order to gauge the impact of the proposed formulations and attacks, we 
first lead various experiments against simulated AES implementations. For this 
purpose, we re-use the notations introduced in the previous section and as- 
sume that the adversary is provided with a key leakage vector L with elements 
L c = HW(Sbox(fc P ( c ) ® Ap( c ))) + A/"(0, a 2 ), and possibly a permutation leakage 
vector L' with elements of the form L' c = HW(S' C ) + AA(0, a 1 ). In both cases, 
the second term is a Gaussian distributed random noise, with variance a 1 that 
we will use as a parameter of our evaluations. Using these notations, there are 
various contexts that could be investigated. As illustrated in Table El we classify 
them among two axes: the target device and the adversary’s means. 


Table 3. Classification of the attacks 



Target devices 

Unp. 

RSI shuf. | RP shuf. 

adversary’s 

means 

L 

UNP-TA 

(2.o) 

INT-TA (1.6) 

UNI-TA (l.a) 

L,L' 


DPLEAK-TA (2.6) 

L.L' 

+ comp. 


irinnun t * f ° cl 1 RESENUM ~ TA ( 2 - d ) 

h ' ' ' \ EXCLUDING- TA (2.e) 


As far as the target device is concerned, we considered the case of an un- 
protected implementation for reference, an RSI-based shuffled implementation 
and a RP-based shuffled implementation. As far as the adversary’s means are 
concerned, we first analyzed attacks where only the key leakage vector L is 
available. Next we evaluated attacks where the permutation leakage vector 1 1 is 
additionally provided. Finally, we quantified the efficiency gains obtained when 
exploiting computational power, in order to enumerate (i.e. sum over) the pos- 
sible permutations. Overall, this gives rise to seven attacks: 

1. Template attack against the unprotected implementation (unp-ta), i.e. the 
straightforward case where S-boxes are executed in deterministic order. 

2. Template attack against integrated leakages (int-ta), i.e. the attack against 
shuffled implementations previously used, e.g. in |7l2(ii:il)l . 

In these two first cases, template attacks and correlation DPA are essentially 
equivalent given that they exploit the same leakage model |E|. For coherence, 
we will keep on using template attacks everywhere. But as the experiments in 
Section 0 target a microcontroller with strong Hamming weight leakage depen- 
dencies, simpler (non-profiled) attacks would naturally apply as well. By con- 
trast, the following attacks explicitly take advantage of a Bayesian description. 

3. Template attack with uniform S c (uni-TA). In this case, the adversary follows 
the Bayesian strategy but does not exploit any information on the permu- 
tation (i.e. he assumes a uniform prior on the leakage cycles). Hence, the 
attacks still have identical efficiencies in the RSI and RP cases. 
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4. Template attack with direct permutation leakage (dpleak-ta). It corre- 
sponds to the attack (2.6) described in the previous section. Here, the leak- 
age vector L' is simply added in the adversary’s conditional probabilities. 
But again, it does not distinguish between the RSI and RP cases. 

5. Template attack with permutation leakage enumerating the RSI permuta- 
tions (rsienum-ta). It corresponds to the attack (2.c) in the previous sec- 
tion, where the adversary takes advantage of the 16 permutations that a 
RSI-based shuffling tolerates to combine its permutation leakages. 

6. Template attacks with restricted enumeration (resenum-ta). It corresponds 
to the attack (2 .d) described in the previous section. A beam search |22| is 
performed to enumerate the most likely permutations. 

7. Template attacks with excluding heuristic (excluding-ta). It corresponds 
to the attack (2.e) in the previous section, where the likelihood of the per- 
mutations is weighted by simply excluding duplicates. 



The result of a simulated information theoretic analysis for these different attacks 
is given in Figured in function of the noise variance. Several observations can be 
highlighted. First, and as usual in such worst-case evaluations, the asymptotic 
trend only appears for large noise levels. In this respect, the main conclusion is 
that (unlike masking [23 )i the slope of the MI curves is the same for both the 
unprotected and all the shuffled implementations. Intuitively, it suggests that 
shuffling can (at best) be used to amplify the noise existing in side-channel mea- 
surements (i.e. imply a shift of the IT curves). Besides, one can observe that 
for lower noise levels, significant differences arise between the different scenarios 
of Table E3 For example, it is interesting to note that even without exploiting 
permutation leakage, the integrated attack is less efficient than the template 
attack with uniform prior. It confirms that this integrated attack is subopti- 
mal in a profiled case, and is not suited to evaluate the worst-case security of 
an implementation in low-noise scenarios. Quite naturally, the distance between 
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integrated and stronger attacks increases as permutation leakage becomes avail- 
able. In this setting, the amount of information extracted is quite dependent on 
countermeasure implemented. If the RSI approach is chosen (and this informa- 
tion is exploited computationally), the implementation turns out to be as weak 
as an unprotected one until noise levels beyond cr 2 = 2°. By contrast, in the RP 
case, the noise amplification happens earlier. In this respect, it is worth to notice 
the limited difference between the dpleak-, excluding-, and resenum-tas for 
RP-shuffied implementations, the latter ones only bringing a small advantage. 
We also observe that as expected, the resenum-ta could only be launched until 
noise levels of approximately cr 2 = 2 -2 : beyond this threshold, the large amount 
of permutations to enumerate with the beam search turned out to be hardly 
tractable. This last fact confirms the expectation in Section El that the small 
bias resulting from our efficient permutation generation algorithm should not 
lead to significantly improved side-channel attacks. 

Note that the insecurity of RSI-based shuffling (and, to a lower extent, RP- 
based shuffling) for low noise levels has to be interpreted with care. What our 
analysis shows is not that the start index or permutation is trivially revealed 
with a template-based SPA (as the number of permutation candidates in the 
beam search already explodes when cr 2 = 2 -2 ). It is really the fact that the 
16 leakage samples of the permutation can be exploited jointly that make these 
countermeasures weak. In other words, what these results show is the importance 
of computational power in the evaluation of shuffling: summing over 16 cases is 
easy, summing of 16! ones is harder, as highlighted by the different curves of the 
rsienum-ta and excluding- /resenum-ta information theoretic evaluations. 

As a complement of information theoretic analyzes, we performed a security 
analysis, and computed the success rates of our different attacks, in function 
of the number of plaintexts measured by the adversary. This allows translating 
the IT curves of Figure d into data complexities. For illustration, we selected 
three different noise variances, corresponding to low (i.e. cr 2 = 2 -3 ), middle (i.e. 
cr 2 = 2°) an large (i.e. cr 2 = 2 3 ) noise levels (where large refers to the fact that the 
IT curves are merging at this stage). The results of these simulated experiments 
are given in Figure El and confirm the previous observations. We again observe 
the weakness of the RSI-based shuffling in the low noise level case, and the lower 
efficiency of the integrated attack. The success rate curves also exhibit the slight 
advantage of the heuristic enumeration when exploiting the leakage of a RP for 
the smallest noise level, as well as the better behavior of the (computationally 
cheap) excluding heuristic when the noise increases (again, the resenum-ta 
evaluation could only be performed in the low noise case, i.e. upper figure). 

5 Practical Experiments 

The previous simulated attacks naturally raise the question whether our attacks 
similarly apply to real world implementations. In order to validate our conclu- 
sions, we also performed these attacks against shuffled implementations of the 
AES, based on the randomized execution path technique of Section El 
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UNP-TA 

INT-TA 

UNI-TA 

DPLEAK-TA 

Success rate 


-xRSIENUM-TA 
-+EXCLUDING-TA 
-• RESENUM-TA 



1000 traces 



Fig. 2. Success rates of simulated attacks, a 2 = 2 3 (top), 2° (middle), 2 +3 (bottom) 
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Our target device is an 8-bit Atmel microcontroller, and our measurement 
setup was monitoring the voltage variations of this target device over a small 
resistor inserted in our supply circuit, with a digital oscilloscope. Based on this 
setup, we profiled our implementation and bruit the probability distributions 
of the vectors L and I/. That is, we first estimated 16 templates correspond- 
ing to the leakages of the permutation indexes c, i.e. Pr[A' c |5 c = s]. Next, we 
constructed 16 x 16 templates for the key leakages at the output of the S- 
box, i.e. Pr[L c \K s = k], for each value of c and s. The reason for having the 
16 x 16 sets of key leakage templates is that these leakages behave differently 
when, at a given point in time, different subkeys are used. That is, we have 
Pr[L c \K Sl \ ^ Pt[L c \K S2 \ if s x + s 2 , and Pr[L Cl \K s ] ^ Pr[L C2 |iq if Cl £ c 2 . 
This fact is due to the slightly different power consumptions of different regis- 
ters and memory accesses of the Furious implementation in our target device. 

In order to limit the profiling efforts, our templates were kept univariate and 
constructed with the stochastic approach from E3, using the Hamming weight of 
the S-box outputs as base vectors. Interestingly, the fact that different key bytes 
give rise to different templates leads to indirect leakages on the permutation. 
That is, we have Pr[L c |A' Sl ] ^ Pr[L c \K S2 ] for a fixed cycle c. By summing over 
the 256 key candidates, we can then obtain marginal probabilities Pr[L c = l c \K s ] 
for all key byte indexes s. This directly leads to useful information of the type: 


Pr [S c = s\L c = l c ] = 


Pr [L c = l c \K s \ 
£ s , Pr [L c = l c \K. ,] 


Furthermore, this information is directly reflected in all the Bayesian attacks, 
without any modification of the descriptions in Section 0 (including UNI-TA for 
which direct permutation leakages are ignored). That is, just the fact that we 
built 16 x 16 templates for different s and c values allows to exploit it. 

The success rates of our experimental attacks are illustrated in Figure 01 
where the noise level corresponds to cr 2 = 3.25. We observe that in this real 
case study, the RSI-based shuffled implementation remains as easy to attack as 
an unprotected one, in our worst-case evaluation setting. Besides, we note that 
the indirect leakage is quite useful for the template attack with uniform prior. 
One important consequence of this indirect leakage is that the uni-ta could also 
apply to our countermeasure with randomized program memory, even if the pre- 
computation was performed in a perfectly secure (i.e. leakage-free) environment. 
Interestingly, we also remark that the integrated attack is less efficient than in 
our simulated experiments, and is stuck to very low success rate for the data 
complexities we considered (yet, it eventually succeeded for larger number of 
measurements). This can be explained by two main reasons. First, the leakages 
on the permutation extracted with our templates (including the indirect ones) 
was larger than in our simulations, which naturally increases the gap between the 
integrating attack and the others. Second, the fact that different Atmel resources 
leak according to different models creates an additional noise for the integrating 
attack, due to a modeling error (i.e. these differences are lost after integration). 
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UNP-TA x ssrsienum-ta + indirect leakage 

'*<»»*■ int-ta h h excluding- ta + indirect leakage 

UNI-TA + indirect leakage dpleak-ta + indirect leakage 



6 Conclusions 

In this paper, we first proposed two new implementations of the shuffling counter- 
measure in small (e.g. 8-bit) microcontrollers. They respectively allow improved 
performances in terms of overall cycle count and online cycle count. Next, we 
provided the first comprehensive evaluation of the shuffling countermeasure, in- 
cluding worst-case Bayesian attacks. For this purpose, we described intuitive 
formulas capturing the different variants of shuffling, and integrated them in a 
general evaluation framework from Eurocrypt 2009. These evaluation tools al- 
lowed us to show that previously used integrated attacks may not be enough for 
assessing the security of a shuffled implementations. We put forward that sim- 
plifying the permutation generation (e.g. by using RSI rather than RP) can lead 
to a complete breakdown of the countermeasure if not too noisy measurements 
are available (which turned out to be verified in a practical case study). We 
also explained the computational origin of these weaknesses (i.e. their relation 
with the total amount of permutations that are considered in the countermea- 
sure). Finally, we exhibited that indirect leakages may be available in shuffled 
implementations, due to the different leakage models of different resources. This 
suggest an interesting scope of further research. Namely, since our results show 
that randomizing the order of instructions in cryptographic implementations is 
not always sufficient, can we design efficient ways to randomize both the execu- 
tion order and the physical resources used in a cryptographic implementation? 

Acknowledgements. This work has been funded in parts by the ERC project 
280141 (acronym CRASH) and the 7th framework European project TAMP RES. 
S. Kerckhof is a PhD student funded by a FRIA grant. F.-X. Standaert is a 
Research Associate of the Belgian Fund for Scientific Research (FNRS-F.R.S). 


756 N. Veyrat-Charvillon et al. 


References 


1. Amarilli, A., Muller, S., Naccache, D., Page, D., Rauzy, P., Tunstall, M.: Can 
Code Polymorphism Limit Information Leakage? In: Ardagna, C.A., Zhou, J. (eds.) 
WISTP 2011. LNCS, vol. 6633, pp. 1-21. Springer, Heidelberg (2011) 

2. Anckaert, B., Madou, M., De Bosschere, K.: A Model for Self-Modifying Code. In: 
Camenisch, J.L., Collberg, C.S., Johnson, N.F., Sallee, P. (eds.) IH 2006. LNCS, 
vol. 4437, pp. 232-248. Springer, Heidelberg (2007) 

3. Atmel, http://www.atmel.com/products/microcontrollers/avr/ 

4. Bayrak, A.G., Velickovic, N., Ienne, P., Burleson, W.: An architecture-independent 
instruction shuffler to protect against side-channel attacks. TACO 8(4), 20 (2012) 

5. Kog, Q.K., Paar, C.: CHES 2000. LNCS, vol. 1965. Springer, Heidelberg (2000) 

6. Chari, S., Rao, J.R., Rohatgi, P.: Template Attacks. In: Kaliski Jr., B.S., Kog, 
Q.K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13-28. Springer, Heidelberg 
(2003) 

7. Clavier, C., Coron, J.-S., Dabbous, N.: Differential power analysis in the presence 
of hardware countermeasures. In: Kog, Q.K., Paar [5], pp. 252-263 

8. Coron, J.-S.: A New DPA Countermeasure Based on Permutation Tables. In: Os- 
trovsky, R., De Prisco, R., Visconti, I. (eds.) SCN 2008. LNCS, vol. 5229, pp. 
278-292. Springer, Heidelberg (2008) 

9. Atmel Corporation. 8-bit Microcontroller with 16K/32K/64K Bytes In-System 
Programmable Flash - ATmegal64P/V ATmega324P/V ATmega644P/V, Rev. 
80110- 07/10 (2010), |http://www. atmel.com/images/8011s. pdf 

10. Daemen, J., Rijmen, V.: The Design of Rijndael: AES - The Advanced Encryption 
Standard. Springer (2002) 

11. Feldhofer, M., Popp, T.: Power Analysis Resistant AES Implementation for Passive 
RFID Tags. In: Lackner, C., Ostermann, T., Sams, M., Spilka, R. (eds.) Austrochip 

2008, pp. 1-6 (2008) 

12. Goldwasser, S., Kalai, Y.T., Rothblum, G.N.: One-Time Programs. In: Wagner, D. 
(ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 39-56. Springer, Heidelberg (2008) 

13. Herbst, C., Oswald, E., Mangard, S.: An AES Smart Card Implementation Resis- 
tant to Power Analysis Attacks. In: Zhou, J., Yung, M., Bao, F. (eds.) ACNS 2006. 
LNCS, vol. 3989, pp. 239-252. Springer, Heidelberg (2006) 

14. Knuth, D.E.: The art of computer programming, 3rd edn. seminumerical algo- 
rithms, vol. 2. Addison- Wesley Publishing, Boston (1997) 

15. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.) 
CRYPTO 1999. LNCS, vol. 1666, pp. 388-397. Springer, Heidelberg (1999) 

16. Mangard, S.: A Simple Power-Analysis (SPA) Attackon Implementations of the 
AES Key Expansion. In: Lee, P.J., Lim, C.H. (eds.) ICISC 2002. LNCS, vol. 2587, 
pp. 343-358. Springer, Heidelberg (2003) 

17. Mangard, S.: Hardware Countermeasures against DPA - A Statistical Analysis 
of Their Effectiveness. In: Okamoto, T. (ed.) CT-RSA 2004. LNCS, vol. 2964, 
pp. 222-235. Springer, Heidelberg (2004) 

18. Mangard, S., Oswald, E., Standaert, F.-X.: One for all — all for one: Unifying 
standard dpa attacks. IET Information Security 5(2), 100-110 (2011) 

19. May, D., Muller, H.L., Smart, N.P.: Non-deterministic Processors. In: Varadhara- 
jan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, pp. 115-129. Springer, Hei- 
delberg (2001) 


Shuffling against Side-Channel Attacks 757 


20. Medwed, M., Standaert, F.-X., GroBschadl. J., Regazzoni, F.: Fresh Re-keying: 
Security against Side-Channel and Fault Attacks for Low-Cost Devices. In: Bern- 
stein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 279-296. 
Springer, Heidelberg (2010) 

21. Messerges, T.S.: Using second-order power analysis to attack dpa resistant software. 
In: Koq, Q.K., Paar [5], pp. 238-251 

22. Moradi, A., Mischke, O., Paar, C.: Practical evaluation of dpa countermeasures on 
reconfigurable hardware. In: HOST, pp. 154-160. IEEE Computer Society (2011) 

23. Poettering, B.: Rijndael Furious, http://point-at-infinity.org/avraes/ 

24. Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential 
power analysis. IEEE Trans. Computers 58(6), 799-811 (2009) 

25. Renauld, M., Standaert, F.-X., Veyrat-Charvillon, N., Kamel, D., Flandre, D.: A 
Formal Study of Power Variability Issues and Side-Channel Attacks for Nanoscale 
Devices. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 109- 
128. Springer, Heidelberg (2011) 

26. Rivain, M., Prouff, E., Doget, J.: Higher-Order Masking and Shuffling for Software 
Implementations of Block Ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. 
LNCS, vol. 5747, pp. 171-188. Springer, Heidelberg (2009) 

27. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Chan- 
nel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, 
pp. 30-46. Springer, Heidelberg (2005) 

28. Standaert, F.-X., Malkin, T.G., Yung, M.: A Unified Framework for the Analysis 
of Side-Channel Key Recovery Attacks. In: Joux, A. (ed.) EUROCRYPT 2009. 
LNCS, vol. 5479, pp. 443-461. Springer, Heidelberg (2009) 

29. Standaert, F.-X., Veyrat-Charvillon, N., Oswald, E., Gierlichs, B., Medwed, M., 
Kasper, M., Mangard, S.: The World Is Not Enough: Another Look on Second- 
Order DPA. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 112-129. 
Springer, Heidelberg (2010) 

30. Tillich, S., Herbst, C.: Attacking State-of-the-Art Software Countermeasures — 
A Case Study for AES. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, 
vol. 5154, pp. 228-243. Springer, Heidelberg (2008) 

31. Tunstall, M., Benoit, O.: Efficient Use of Random Delays in Embedded Software. 
In: Sauveron, D., Markantonakis, K., Bilas, A., Quisquater, J.-J. (eds.) WISTP 
2007. LNCS, vol. 4462, pp. 27-38. Springer, Heidelberg (2007) 

32. Zhang, W.: State-space search - algorithms, complexity, extensions, and applica- 
tions. Springer (1999) 


Theory and Practice of a Leakage Resilient 
Masking Scheme 


Josep Balasch 1 , Sebastian Faust 2 , 

Benedikt Gierlichs 1 , and Ingrid Verbauwhede 1 

1 KU Leuven Dept. Electrical Engineering-ESAT/SCD-COSIC and IBBT 
Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium 
f ir stname . lastname@esat . kuleuven . be 
2 Aarhus University 

Abogade 34, DK-8200 Aarhus, Denmark 
sfaust@cs.au.dk 


Abstract. A recent trend in cryptography is to formally prove the leak- 
age resilience of cryptographic implementations - that is, one formally 
shows that a scheme remains provably secure even in the presence of 
side channel leakage. Although many of the proposed schemes are secure 
in a surprisingly strong model, most of them are unfortunately rather 
inefficient and come without practical security evaluations nor imple- 
mentation attempts. In this work, we take a further step towards clos- 
ing the gap between theoretical leakage resilient cryptography and more 
practice-oriented research. In particular, we show that masking counter- 
measures based on the inner product do not only exhibit strong theo- 
retical leakage resilience, but moreover provide better practical security 
or efficiency than earlier masking countermeasures. We demonstrate the 
feasibility of inner product masking by giving a secured implementation 
of the AES for an 8-bit processor. 

Keywords: Inner product masking, AES, Leakage resilience. 

1 Introduction 

Side channel attacks (SCA) are among the most relevant threats for the se- 
curity of implementations of cryptographic algorithms. Since the introduction 
of timing attacks to the research community in the late 1990s m, more side 
channels have been discovered [11 312 312 5 j and more powerful attacks have been 
developed f-llfil 1 i| . It was soon clear that masking, i.e. concealing all sensitive 
intermediate values of a computation with random data, is an excellent way to 
prevent certain types of attacks j'llilj . As opposed to other countermeasures 
aiming at introducing noise in the side channel, e.g. random delays, random 
order execution, dummy operations, etc., one can formally argue the security 
masking provides. 

The idea of d th order masking is to split every sensitive intermediate value in 
the implementation into d + 1 random shares, and to compute the algorithm on 
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these shares while maintaining that each tuple of d shares is independent of any 
sensitive value. The challenge is not to devise the masking scheme itself, i.e. to 
determine how a sensitive intermediate value is split, but rather to define the 
masked operations that process the independent shares, while still preserving 
the correctness of the computation. A d th order masked implementation can, in 
theory, always be broken by a d + 1 th order side channel attack, i.e. an attack 
that exploits side channel leakage of d + 1 intermediate values in the masked 
implementation. However, given a sufficient amount of noise, attacking a d th 
order masked implementation becomes exponentially more difficult in d |S| . Mo- 
tivated by this result, d th order masking schemes (that can be implemented at 
any order d) based on boolean masking j22j and polynomial masking fLSITlj have 
been recently proposed. Unfortunately, their security has so far been evaluated 
mainly by practice-oriented researchers, while a formal proof-driven analysis is 
either missing or is given only in a very weak security model. 

In the theory community, masking-based countermeasures are analyzed within 
the framework of leakage resilient circuit compilers introduced by Ishai et al- ED|- 
A circuit compiler takes as input an arbitrary circuit C computing over some 
finite field and outputs a protected circuit C' that has the same functionality as 
C but comes with built-in security against certain classes of leakages. For the 
circuit compiler of m it can be shown that an adversary that learns up to d 
intermediate values during the computation of the transformed circuit C' does 
not learn anything beyond black-box access. That is, for instance, if C is an AES 
circuit then its implementation C' exhibits the standard black-box security even 
in the presence of side-channel leakage (in the given model) . 

The circuit compiler of Ishai et al. based on boolean masking with d masks 
has been recently extended, and a similar compiler (based on any linear secret 
sharing scheme) protecting against broader classes of leakages has been intro- 
duced HU. Despite this progress, it has been suggested that masking schemes 
with greater algebraic complexity yield better resistance against side channel 
attacks. As boolean masking schemes only achieve weak provable security guar- 
antees, attempts have been made to seek for alternatives. First examples are 
the compilers of Juma and Vahlis m and Goldwasser and Rothblum HU which 
use as underlying masking a public key encryption scheme, i.e. every sensitive 
variable is encrypted with a suitable public key encryption scheme. While such 
compilers achieve strong security guarantees, namely, protection against any 
polynomial-time computable leakage function, they suffer from poor efficiency 
and rather provide theoretical feasibility results than a way towards a practical 
solution. 

In two recent works |1 (111 til , it was shown that such strong theoretical security 
guarantees can be achieved without relying on public-key encryption schemes. 
Instead, these works propose a purely information theoretic solution based on 
the inner product. While asymptotically these constructions are comparable to 
schemes based on public key encryption, they have the potential to achieve much 
better real-world efficiency as they only require simple algebraic operations. In 
this work, we show that this is indeed the case, if one is willing to accept a weaker 


760 J. Balasch et al. 


security model. In a nutshell, our work shows that advances in leakage resilient 
cryptography can indeed have implications to real-world implementations and 
may even provide better practical security or efficiency than existing schemes. 
Contributions. We rely on ideas of Dziembowski and Faust [H]| for the in- 
ner product (IP) masking, and adjust the masked operations to improve their 
efficiency. As we are particularly interested in a secure implementation of the 
Advanced Encryption Standard (AES), we can exploit the linearity of the squar- 
ing operation in the underlying finite field F 2 s. Moreover, we slightly simplify 
the masked multiplication operation of mg. All these changes are done without 
affecting the theoretical security analysis. The bulk of our efficiency improve- 
ments, however, comes from using a simpler method to refresh a masked secret. 
Such a refreshing scheme takes as input a masked secret and outputs a masking 
of the same value with completely fresh randomness. The construction that we 
use in our implementation is essentially a simple variant of a scheme proposed 
in 0 . As such simple schemes only satisfy weaker security properties, we need to 
make additional restrictions to get a sound theoretical security analysis. We pro- 
vide further details on how our changes affect the security and what additional 
assumptions are required in Section 0 

We also evaluate the security of the IP masking for practical parameters, i.e. 
when the number of shares is small. Our practical analysis reveals that the in- 
formation leakage of IP masking is more than two orders of magnitude smaller 
than that of boolean masking for low levels of noise and the same number of 
shares. Finally, we detail how the AES can be implemented in a secure way 
using the IP masking scheme, and we provide an implementation and perfor- 
mance results to demonstrate its correctness and feasibility. We show that in 
particular non-linear operations in the IP masked domain, e.g. multiplication, 
clearly outperform polynomial-based masking solutions that enjoy similar alge- 
braic complexity. 

2 Inner Product Masking 

In this section we introduce the circuit model assumed for the execution of 
the masked calculations, and we provide a detailed description of the masking 
scheme and its building blocks, including a complexity analysis and a comparison 
to other masking schemes. 

Circuit Model. Following the model of Dziembowski and Faust jMl flj , we con- 
sider that the target device running the masked computations contains two sep- 
arate processors. Each of these processors, in the following referred to as left 
processor (P|_) and right processor (Pr ) , executes a part of the masked opera- 
tions. Communication between processors is performed via a bidirectional data 
bus. Such a model is introduced in order to provide a framework to analyze the 
security of the masking scheme. As will be further explained in the following 
sections, its main purpose is to facilitate the assumption that P\_ and Pr have 
completely independent side channel leakage, i.e. an adversary can only retrieve 
information specific to each physical processor. Notice that from a practical point 
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of view, the required independent side-channel leakage can also be obtained by 
temporal (rather than physical) separation of the masked computations, e.g. in 
the context of sequential software implementations on a single processor. 
Overview. The IP masking scheme can be instantiated to secure operations 
in any finite field |F| > 2, such that all elements and operations in F can be 
mapped to and performed in the masked domain. This feature is extremely 
useful in the context of securing cryptographic applications, as the underlying 
field of the masking scheme can be adapted according to the characteristics of the 
cryptographic algorithm and/or the target platform. Without loss of generality, 
and driven by our goal to implement the AES, we provide in the following an 
efficient instantiation of the IP masking scheme for the field F 2 s of characteristic 
two. 

Notation. We represent field elements with upper-case letters, e.g. X £ F 2 s, 
and we use ® to denote field addition and 0 to denote field multiplication. 
Vectors are represented with bold upper-case letters, e.g. X 6 F£ s such that 
X = (Xi, . . . , X n ). For two vectors X, Y £ F£ s we denote by X ® Y the 
vector addition in F” s calculated as (Xi CD Y\, ... . X n ® Y n ) . The inner product 
(X, Y) £ F 2 s is calculated as ®" =1 Xi 0 Y-i- 

Construction. In the IP masking scheme each sensitive variable X £ F 2 s is 
split into an even number of 2 n shares such that: 

X = Ti(0 J?i® ...®L n ®R n . (1) 

We denote L = (L i, . . . , L n ) as left vector and R = (R \ , . . . , R n ) as right vector. 
A variable X is represented in the masked domain as ( L,R ), and can be re- 
covered by calculating the inner product of these two vectors, e.g. X = ( L , R). 
In order to prevent a practically exploitable bias between the shares and the 
masked value, it is required that elements of L belong to F 2 s \ {0}. We define 
n > 2 as the security parameter of our masking scheme. 

Note that IP masking is a generalization of previously published masking 
schemes. Indeed, one trivially derives boolean masking from Eq. ([Q) by e.g. 
setting all elements in L (resp. R) to one. Multiplicative masking |2j can be 
achieved by setting n = 2 and either of the shares L 2 and/or R 2 (resp. Li 
and/or Ri) to zero. Affine masking, described in [121 as V = (A ® X) ® B, can 
be obtained by fixing n = 2, Li = L 2 = A -1 , Ri = V, and R 2 = B. Finally, 
as a secret variable in polynomial masking jld!24j is given by an interpolation 
polynomial in the Lagrange form, such masking scheme can be obtained by 
considering all elements in L to be public Lagrange coefficients. 

Algorithm [I] depicts the procedure IPMask() to convert a variable X into the 
IP masked domain as two vectors ( L , R) of size n. The function rand() returns a 
random element in F 2 s, whereas the function randNonZeroQ returns a random 
element in F 2 s \ {0}. The function IPUnmask() to convert a masked variable 
( L , R) of size n back to X consists in calculating the inner product X = (L, R). 
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Algorithm 1. Masking a variable: (L. R) <— IPMask(X) 
Input: variable X 6 F 2 s 
Output: masked variable ( L,R ) 

Ensure: X = { L , R) 

Li <— randNonZero() 
for i = 2 to n do 

Li <— randNonZero(); Ri <— rand() 
end for 

Ri (X ® @" =2 Li®Ri)® Lr 1 


2.1 Operations in the Masked Domain 

After introducing how to convert variables between F 2 s and the IP masked do- 
main, we need to provide a set of high-level functions that allows us to operate 
directly on the masked variables. In order to fulfill our security requirements, 
computations regarding the left vector L of masked variables should be executed 
in the left processor P|_, whereas calculations regarding R should be carried out 
in the right processor Pr. Moreover, the condition that elements of the vector 
L are different than zero must be inherited by all operations in order to avoid 
output masked values from being biased. 

In the following we make use of a special operation called IPHalfMask(), which 
on input a variable X and a vector L calculates the corresponding vector R such 
that X = (L, R). It is thus a simplified version of Algorithm Q] for which the left 
vector L is already given and thus elements L, do not need to be sampled. 

Another operation that will be often used is IPRefresh(). This operation, 
depicted in Algorithm |21 takes as input a masked variable (L, R) and returns 
a new one ( L',R ') such that ( L,R ) = (L' . R'). The purpose of the refreshing 
is to pump new randomness into the masking scheme. Algorithm |3 is tailored 
particular to work for the field F 2 s . For a generalization we refer the reader to 0 . 


Algorithm 2. Refresh vector: (L' , R') <— 

IPRef resh(T, R) 

Input: vector L in processor Pl, vector R in 

processor Pr 

Output: vector L' in processor Pl, vector R' 

in processor Pr 

Ensure: ( L,R ) = ( L',R ') 

1 * 1 

1 ^ 1 

A F 2 8 

^ * 

© 

II 

X = IPUnmask(A, R) 

B = IPHalf Mask(A, L’) >■ 

R' = R® B 


Although not clearly specified in Algorithm El it is necessary that the vector 
A sampled by P|_ is such that the resulting elements of L' are non-zero. In other 
words, we need to ensure that A, ^ Li for all 1 < i < n. Details on how to 
implement this step efficiently, in constant time and flow are given in Sectional 
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Addition. The procedure IPAdd() to calculate the addition of two masked vari- 
ables is depicted in Algorithm 0 This algorithm requires a three vector ad- 
ditions, two joint executions of IPRefresh(), one of IPUnmask(), and one of 
IPHalfMaskQ. 


Algorithm 3. Masked addition: ( X,Y ) <— IPAdd ((L,R), ( K,Q )) 
Input: vectors L and K in processor Pi, vectors R and Q in processor Pr 
Output: vector X in processor Pi, vector Y in processor Pr 
Ensure: (X, Y) = (L, R) ® {K, Q) 

I «■ I I I 

(A,B)<-IPRefre s h(*-,qaR) 

(C,-D)<-IPRefresh(I.aK',-R) 

y<-IPHalfMask(Z,A) 

X=A Y=Y®B 


Notice that it might be the case that the component L © K in the second 
execution of IPRefresh() has elements equal to zero. While this is a source of 
first-order leakage in IP masking, i.e. the probability Pr(Z = 0|(Lj ® K t ) = 0) 
is twice than that for any other value of Z, it is in this particular case not 
exploitable by an attacker. This is because Pv({X,Y)\Z = 0) is uniformly dis- 
tributed, i.e. knowing that the intermediate value Z is zero does not give any 
information about the sensitive output value (X, Y). 

Addition of a Constant. The procedure IPAddConst() to add a constant 
Z e F 2 s to a masked variable ( L,R ) can be carried out more efficiently than 
Algorithm 0 Let ( L,R ) and Z be the input operands, and (X, Y) the output 
masked variable. Addition of a constant can be simply calculated by letting 
X = L and Y = R, except for the first element Y i = (Pi ® Z) ® Pf 1 . 

Multiplication. The procedure IPMult() to calculate the multiplication of two 
masked variables is depicted in Algorithm El This algorithm requires 2 n 2 initial 
field multiplications, one execution of IPRefreshQ with input/output vectors 
of size n 2 , one execution of IPUnmask() with input vectors of size n 2 — n, one 
execution of IPHalf Mask() , and one final vector addition. 

Multiplication by a Constant. The procedure IPMulConst() to multiply a 
masked variable ( L,R ) by a constant Z e F 2 s is efficiently computed in IP 
masking. Let ( L,R ) and Z be the input operands, and (X, Y) be the output 
masked variable. Multiplication by a constant can be performed by letting X = 
L and calculating Y = (Rq Z, , R n <8 ) Z). As will be further explained in 
Section El it is not necessary to execute IPRefreshQ after IPMulConstQ. 
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Algorithm 4. Masked 1 

nultiplication: ( X , Y) 4 — IPMult((L, R ), (K, Q)) 

Input: vectors L and K i 

n processor Pi, vectors R and Q in processor Pr 

Output: vector X in processor Pi, vector Y in processor Pr 

Ensure: ( X , V) = ( L , R) ® (K, Q) 

1 A- 1 

1 ^ 1 

for i = 1 to n do 

for i == 1 to n do 

for j = 1 to n do 

for j = 1 to n do 

Ui*n+j < -Li® Kj 

Vi* n+ j 4 - Ri ® Qj 


(tr, V)-< IPftefresh(t7 ,V) 

A=(Ul,...,Un) 

C = (U n+ 1, ...,U n 2 

B = (Vl,...,V n ) 

) D = (Vn+l,...,V n ,) 

Y «— IPHalf Mask ( Z, A) 

X = A 

Y = Y®B 


Squaring. The procedure IPSquareQ can be carried out quite efficiently in 
the masked domain given that we work over a field of characteristic 2. Let the 
input masked variable be ( L,R ). The output masked variable (X. Y) can be 
calculated by squaring all elements of each vector independently, i.e. A, = (Li ) 2 
and Yi = (R 4 ) 2 . The masked squaring operation does not require refreshing the 
masks, and can be thus carried out with only 2 n field squarings. 

2.2 Complexity of Operations 

The complexity of the main operations in the IP masked domain, namely ad- 
dition and multiplication, is given in Table d We also provide a comparison 
with some masked operations that can be implemented at any order d, recently 
published in the literature for boolean and polynomial masking schemes, namely 
|1 81241271 . The complexity numbers are given in terms of d for all the schemes, 
where d indicates the number of random values in each masked variable. Recall 
that in IP masking, this number of random values is given by d = 2n — 1, with 
n > 2. 

As shown in Table d the complexity of the addition operation in IP masking 
is slightly larger than in the other proposed methods. This is mainly due to 
the internal use of the IPRefresh() operation which, as opposed to the other 
masking schemes, involves several field multiplications. However, the results ob- 
tained for the multiplication operation are favourable for IP masking. In partic- 
ular, both polynomial masked multiplications have complexity 0(d 3 ) while IP 
masked multiplications have complexity 0(d 2 ). The boolean masked multipli- 
cation has a similar complexity but, as we will show in the next sections, the 
masking scheme itself provides considerable less security from both practical and 
theoretical points of view. 


Theory and Practice of a Leakage Resilient Masking Scheme 765 


Table 1. Complexity of IP masked operations and comparison to (I th order boolean 
masked operations and polynomial masked operations in the literature 


Masked 

Operation 

Scheme 

Operations in F 2 g 

Rand 

© 


X 

ADDITION 

Polynomial EH 
Inner Product 

d + 1 
d + 1 
d + 1 

(13d + l)/2 

3d + 3 


(7d + 3)/2 

MULTIPLICATION 

Boolean E| 
Polynomial 11 HI 
Polynomial 
Inner Product 

d 2 + d + 1 

2 d 3 + 7 d 2 + d 
4d 3 + 8d 2 + 7d + 2 
(5 d 2 + 12d - 9)/4 

2d 2 + 2d 

2d 3 + 5d 2 + 5d 
4d 3 + 8 d 2 + 3d 
(5d 2 + lOd + 5)/4 

2 

(d 2 + d)/2 

2d 2 + d 

2d 2 + d 

(Sd 2 +8d- 3)/4 


3 Security Evaluation 

In this section we evaluate the SCA resistance of IP masking and compare it 
to that of other masking schemes that can be implemented at any order, e.g. 
boolean masking and polynomial masking. We focus the analysis on the masking 
schemes themselves, i.e. we analyze the leakage of the shares of one masked 
value. We will show in the next section that the security relevant properties of 
IP masking carry over to the basic operations in the masked domain. 

Attack Order. We begin the evaluation by deriving the minimum order for an 
attack against IP masking. For this we need the following definitions: 

Definition 1: We say that a variable is sensitive, if it is an intermediate result in 
an implementation that leaks through side channels, and if it is a function of the 
input (resp. output), the key and possibly other constants that is not constant 
with respect to the key (221 • 

Definition 2: We say that a masking scheme is d th order SCA secure, if every 
tuple of d or less shares is independent of the variable that is masked. Accord- 
ingly, a masked implementation of an algorithm is d th order SCA secure, if every 
tuple of d or less intermediate variables is independent of any sensitive variable. 
1 st order SCA resistance. Clearly, IP masking with n > 2 is 1 st order SCA secure. 
This is a simple consequence of the fact that, even if the value of one of the shares 
in L or R is known (in the worst case one Ri is known to be zero such that = 

0) , the value of the variable that is masked is still information theoretically hidden 
by the © with n — 1 terms that are all uniformly distributed over F 2 s . 

2 nd order SCA resistance. IP masking with n = 2 is not 2 nd order SCA secure. 
This is because the product of two values is determined to be zero if one of the 
values is zero. Multiplicative masking j2j suffers from the same problem dJ. 
Suppose that the values of R\ and R% are known to be zero. Then, L\ 0 0 © ® 

0 = s = 0. This leads to a bias in the distribution p(S = s|i?i = ri,i ?2 = ^ 2 ), 
and the mutual information / (s; (i?i,i? 2 )) is non-zero. 

d th order SCA resistance. IP masking with 2n = d + 1 is SCA secure up to n — 1 th 
(or — 1 th ) order, but not secure against n th (or ) order SCA. Following 
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the above examples, as long as the product of one pair (Li, Ri), i £ { 1 , . . . , n} is 
unknown, the value of the variable s that is masked is still information theoretically 
hidden. On the other hand, if Vi £ {1, . . . , n} the value of R, t is known to be zero, 
then the value of s is known to be zero. However, the probability that this case 
occurs is small and decreases rapidly with increasing n. More precisely, it is (2~ 8n ). 

In summary, IP masking with 2n = d + 1 can, in theory, be broken by a n th 
order SCA. On the other hand, similar to polynomial masking, it creates a much 
more complex relation between the shares than boolean masking, which is known 
to be more difficult to exploit. Hence, we expect IP masking with 2n = d + 1 to 
provide much higher security in practice than boolean masking of order d+ 1, i.e. 
with the same number of random masks. Following this line, we opt to consider 
the leakage of all 2n or d + 1 shares in the following analysis, since an attack 
exploiting all shares is more powerful in an information theoretic sense, unless 
the noise levels are extremely high. 

In polynomial masking half of the shares are non-zero public constants and the 
other half are random and secret masks. In particular, there is no direct correspon- 
dence to the notion of a masked variable. In the rest of the paper we refer only to 
the random and secret shares, and their number determines the masking order. 
For example, polynomial masking of order d — 1 uses d random and secret shares, 
and can theoretically be broken by a d th order SCA. We will compare polynomial 
masking of order d — 1 with boolean masking of order d (d + 1 shares, d masks) 
and with IP masking of order 2n = d+1 (d + 1 shares, d masks). One could expect 
IP masking with 2n = d + 1 to provide a similar level of security as polynomial 
masking of order d — 1, i.e. both schemes should provide similar security when 
they use the same number of random and secret masks. 

Information Leakage. As motivated and done in previous works |1 2I24I2SI2D| . 
we use the mutual information between a variable and the leakage of all shares of 
its masked representation as a figure of merit. We estimate it using simulations. 
For IP masking, we set n = 2 and let R% £r F 2 s and L \ , L 2 £r F 2 s \ {0} such 
that S = L\® R\® Lz® Ri- Boolean masking uses d+l shares (Mi, . . . , M,i , V) 
where the Mi £r F 2 s and V is computed such that S = Mi®. . .®Md®V holds. 
We evaluate boolean masking for d £ {1,2,3}. Polynomial masking uses d shares 
(Yt , . . . , Yd) with Yi £r F 2 s and d public Lagrange coefficients (Pi , . . . , Pd) with 
Pi £r F 2 8 \ {0} and pairwise distinct (IB] ■ We evaluate polynomial masking for 
d £ {2, 3}. 

To quantify the amount of information leaked, we need to model the relation 
between the value of a variable and its physical leakage. We follow the approach 
that is usual in the literature jl 212412b] : we model that a variable leaks its Ham- 
ming weight, that each share leaks independently of all other shares, and that 
the leakage of each share is affected by independent Gaussian noise. The latter 
serves to mimic the noise effects that affect physical measurements. Putting this 
together, we model the leakage of IP masking as 


Leak(L,R) = (HW(Ti) + m,HW(R 1 ) + n 2 ,HW(£ 2 ) + n 3 ,HW(R 2 ) +n 4 ) 
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the leakage of boolean masking as 

Leak(M 1; ...,M d ,V) = (HW (M 1 ) + m, . . . , HW(M d ) + n d , HW(V) + n d+1 ) 

and the leakage of polynomial masking as 

Leak(Yi , ...,Y d ) = (HW(Yi) + m, . . . , HW(Y d ) + n d ) 

where the rii are independent Gaussian variables with mean zero and standard 
deviation a. The mutual information is then 7(5; Leak(L, R)), 7(5; Leak(Mi, . . . , 
M ( j. V ) resp. 7(5; Leak(Yi, . . . , Y d )). The number of measurements that a Tem- 
plate Attack P, i.e. the worst case scenario of a profiled attack, requires to 
achieve a given success probability is directly related to this mutual information 
via c - 7(- ; -) — 1 , where the constant c is related to the success probability J2H|. 

Figure Q shows plots of the mutual information (log 10 ) between 5 and the 
information leaked by all shares of its masked representation, over increasing 
noise levels a, for all masking schemes considerecQ q 



Noise standard deviation a 


Fig. 1. Mutual information (log 10 ) over increasing noise standard deviation a for dif- 
ferent masking schemes 

The figure shows that IP masking with n = 2 leaks consistently less than 
boolean masking with d £ {1, 2, 3} across the range of tested noise levels, which 
confirms our expectation. The advantage is more pronounced for low noise levels, 
where e.g. for a = 0.2 the information leakage of IP masking is about 2.5 orders 
of magnitude(!) smaller than that of boolean masking. As expected, polynomial 
masking with d = 2 leaks consistently more than IP masking with n = 2. Poly- 
nomial masking with d = 3 provides a level of security very similar to IP masking 

1 Note that the mutual information values we computed for boolean masking are 
consistent with Figure 1 in [T2I and Figure 3 in |24| . One has to take into account 
that the Y-axis in those figures is erroneously labeled log 10 while it should be log,„ p. 

2 For polynomial masking with d = 3, reasonably accurate estimation of the mutual 
information values for high noise levels is beyond our computational budget. 
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with n = 2 for low noise levels. However, contrary to what one could expect, 
for high noise levels, polynomial masking with d = 3 leaks less than IP masking 
with n = 2. There are several possible explanations for this observation. For 
instance, IP masking with n = 2 involves two field multiplications while poly- 
nomial masking with d = 3 involves three field multiplications, i.e. the algebraic 
complexity of the masking is greater. Furthermore, IP masking with n = 2 is 1 st 
order SCA secure while polynomial masking with d = 3 is 2 nd order SCA secure. 
It is known that leakage of lower order is easier to exploit, in particular with 
increasing noise m ■ We leave the careful analysis of the observed difference in 
information leakage as an open question for future research. 

Discussion. Our evaluation shows that IP masking with n = 2 provides high 
security even if there is little noise. However, although the simulated scenario 
(Hamming weight leakage, independent leakage of each share, Gaussian noise) 
is standard in the practice-oriented literature, it is synthetic and in particular 
meets the requirement of the masking schemes for independent leakage perfectly. 
It can be hard to achieve this for real-world implementations that are affected 
by effects such as coupling (we show in the extended version j3| that glitches do 
not affect the security of IP masking). Clearly, our evaluation does not allow to 
blindly assume that an implementation of IP masking is secure. What it shows 
is the level of security that a secure implementation of IP masking can provide. 
An interesting topic for future research is to analyze the security provided by 
a real-world implementation, and to analyze how violating a requirement, e.g. 
independent leakage, affects practical security. 

4 Theoretical Security Analysis 

In this section, we review some formal security properties of the IP masking. We 
give the basic security properties of the masking scheme itself, including very 
strong security guarantees with respect to non-adaptive leakages, and argue that 
these properties carry over to the basic operations in the masked domain. In the 
full version j3j we discuss further relaxations, and argue that our construction 
provides security against glitches similar to the results given in |23j . 

Notation. In the following we let F be a finite field, and we typically consider 
row vectors. We define the statistical distance between two random variables 
A, B over some set X as 

A(A; B) := ^ 1/2 |Pr[A = x] - Pr [B = ®]| . 

xex 

4.1 Security Properties of IP Masking 

We have argued in Section 0 that even for small n, IP masking is robust to 
(noisy) Hamming weight leakage from the different shares of the masking. In 
this section, we back up these observations with a theoretical analysis showing 


Theory and Practice of a Leakage Resilient Masking Scheme 769 


strong security properties for IP masking that cannot be achieved, e.g. by linear 
masking schemes such as Boolean masking or masking schemes based on Shamir 
secret sharing. The analysis strongly relies on techniques and results from |10ll T j . 
We repeat here part of the arguments where changes to the construction and 
model are required to get practical constructions. For a more formal analysis and 
full proof details the reader is referred to [U3] . We emphasize that the theoretical 
analysis will typically require n > 130 to get meaningful security bounds. 

As mentioned in Sect. El we assume that the device that runs the masked 
computation has two processors, P|_ and Pr, leaking independently. Let S l de- 
note the state of processor Pi and Sr the state of processor Pr (resp.), then the 
adversary may interact with L2(Sl,Sr) by sending functions fi and Jr to the 
oracle and getting back /l(Sj,) and /r{Sr). The only additional requirement 
that we make is that an adversary will not learn more than A bits from each 
processor Pi and Pr. We call such an adversary A- limited and denote the pro- 
cess of the adversary interacting with the leakage oracle by (A f2(L, R)). For 
simplicity, we always assume that the output of A in the above leakage game 
is fl( S L ), /E(Si,), . . . , /ij(Sij), /i(Si?), .... We emphasize that by modeling leak- 
age in this way, we allow it to depend on any intermediate value that may be 
computed during the computation of the two processors. 

To analyze the security of an IP masked value S from some finite field F, we 
set S l '■= L and Sr := R, where (L, P)4— IPMask n (S'). and let the adversary 
interact with the f2(L, R) leakage oracle. The following lemma was proven in fTTlj . 


Lemma 1. Let n gN and let F be such that n > log(|F|). For any 1/2 > 6 > 
0,7 > 0, any two secrets S,S r GW and any (unbounded) X-limited adversary A 
we have 


A{{A s=» RilPMasKiS))) , (A±* RilPMasKiS'))) < e, 
where A = (1/2 — <f)nlog |F| — log7 _1 — 1 and e = 2(|F| 3 / 2- " 5 + |F|7). 
Informally, the lemma says that for any two (different) secrets S, S' no adversary 
can distinguish between leakage from a masking of S and a masking of S' . 

As a special case, this gives us the following corollary when the underlying 
field is F28, namely: 

Corollary 1. Let then for any two secrets S, S' G F28 and any X-limited 

adversary A, we have 

A{{A s=» R(lPMasK{S ))) , {A ^ R{IPMasK{S'))) < e, 

where A = 3 n and e < 2 13-0 - 1 ” + 2~ n . 

Proof. Set 6 := 0.1 and 7 := 2 -0 - 2 ", then we get A = 3.2n — 0.2n — 1 = 3n and 
e < 2 13-0 - 1 " + 2 _ ". □ 

Corollary Q says that for sufficiently large n an adversary may learn up to 3n 
bits from each processor without being able to distinguish between a masking 
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of S and S'. We notice that the bound on the statistical distance only gets 
meaningful, when n > 130, which, of course, is impractical. 

One may ask if we can get stronger security guarantees for the masking scheme 
if we restrict our focus to certain special cases. To this end consider the case that 
the adversary cannot query adaptively the leakage oracle, i.e. he may learn only 
Jl(L) and fn(R). In this case, it is easy to show that from the fact that the inner 
product is a strong randomness extractor im we can give to the adversary 
the entire L and up to 3 n bits of R, and still it will be hard to decide whether 
( L,R ) was sampled from IPMask n (S') or IPMask n (S"). 

Comparison with Linear Masking Schemes. We notice that linear masking 
schemes, such as the additive masking over finite fields 121)1271 . cannot achieve 
such strong security properties in our security model. Consider a secret SgF 
that is masked by vectors (L, R) such that (L. R) are uniformly random in F 2n 
subject to the constraint that S = JT A: + - ^ we consider an adversary 

that can interact with f2(L,R) then already a single field element of leakage 
entirely breaks the security: f(L) may reveal JT while g(R) reveals JT R%, 
which together reveal S completely. 

For fields of characteristic 2 such as F 2 s already a single bit of leakage suffices 
to learn information about the secret! Recall that in characteristic 2 fields addi- 
tion works bit-wise. Similar arguments work for the polynomial masking based 
on Shamir secret sharing introduced in H3I, as Lagrange polynomial interpola- 
tion is linear. Hence, such masking schemes can be broken in our model. 

We emphasize that our leakage model includes certain classes of leakages that 
are very frequently used in practice, e.g. to model power consumption. One 
example is the Hamming weight leakage model. Of course, our theoretical anal- 
ysis includes Hamming weight leakages as an adversary can learn the Hamming 
weight of a masked value and still the masked value remains information theo- 
retically hidden. More precisely, as shown in Corollary [Q the IP masking remains 
provably secure even if an adversary learns L completely and 3 n bits of R. As 
Hamming weight is a linear function we can compute the Hamming weight of 
(L, R) from just the Hamming weight of L and R separately. Notice that the 
Hamming weight of R can be compressed to < 4 log n bits, while according to 
Corollary Q we are allowed to learn 3 n bits of R. We emphasize that an adver- 
sary may even learn the individual Hamming weight of each share Ri , . . . R n of 
the right vector and still the IP masking remains secure. This is easy to see as 
we can describe the Hamming weight of the n shares for sufficiently large n by 
at most nlog(8) = 3n bits, which according to Corollary Q] an adversary may 
learn from R. We emphasize that for additive masking schemes, such as Boolean 
masking, it is not known whether such strong security guarantees hold. 


4.2 Security of Masked Operations 

So far, we looked at the robustness of the IP masking scheme in the presence 
of independent leakage, when we mask a secret value (or several secret values) 
and store the left part L on processor P\_, while R is stored on processor Pr. In 
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the following, we “lift” the security analysis from just masking the secret state, 
e.g. the key of the AES, to arbitrary computation with masked values. More 
precisely, we describe why leakage from operations on masked values will not 
help to learn more about the masked value than just the leakage from a single 
masking. This can be viewed as a reduction from the security of “complicated” 
masked computation, to the security of a single masked value. The details of this 
analysis can be found in the full version. 

In the security proof, we follow Dziembowski and Faust m and show two 
simple properties for the basic masked operations. These properties were in- 
troduced in m and are called rerandomizing and reconstructability. The first 
guarantees that for a masked operation the encoded output of the operation is 
distributed as a uniformly and independently sampled encoding. We show in the 
full version that all our masked operations satisfy this property. We notice that 
the algorithms for squaring and multiplication by a constant require only local 
computation, and hence do not require a refreshing. 

To show reconstructability for a masked operation, we need to build a recon- 
structor. A reconstructor is a simulator that given the operations’s masked inputs 
and outputs can reproduce the internal computation of the operation. The main 
requirement is that leakage from the reconstructor’s output distribution (namely 
the internal computation) is indistinguishable from the leakage obtained from a 
real execution of the operation. At an intuitive level, this property guarantees 
that leakage from the internals of a masked operation will not reveal “more” 
information about the underlying secret than just the leakage from the masked 
inputs and outputs itself. 

For practical reasons, we slightly adapt the construction of [TTTj . The three 
main differences are as follows: (1) the way in which we refresh masked secrets, 
(2) dedicated efficient masked operations for squaring and multiplication by a 
constant, and (3) a simplified masked multiplication operation (instead of a 
NAND we only build a simple multiplication). We discuss some details below. 
A more thorough discussion is deferred to the full version. 

In the implementation, we use Algorithm |3 to refresh a masking of ( L,R ), 
which is a simple variant of the scheme given in mi To enable a security proof, 
we will in the following assume that the refreshing does not leak. This is re- 
quired as Dziembowski and Faust show a theoretical attack on a similar refresh- 
ing scheme in [Oj . Unfortunately, their attack also applies on the refreshing from 
Algorithm |3 The attack presented in jHj recovers the masked secret and requires 
an adversary to learn for n consecutive rounds the exact value of 3 field ele- 
ments. While in theory such an attack completely breaks the masking scheme, 
we emphasize that for a real-world adversary it is very hard to learn the ex- 
act value of field elements. If learning the exact value of 3 field elements over 
n consecutive rounds is possible, then from a practical point of view it seems 
hard to argue why the adversary should not be able to learn the exact value 
of all 2 n shares in one round of the refreshing. Notice also that practical SCA 
attacks typically require some knowledge of the inputs/outputs of the algorithm. 
For the refreshing algorithm this is not possible as both inputs and outputs are 
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unknown and random. This makes attacking the refreshing a hard target. One 
may ask why we do not use alternative approaches of provably secure refreshing 
as presented in |0| and m- Our choice is motivated by practical limitations as 
existing refreshing schemes result in a quadratic blow-up. 

5 Performance Evaluation 

In this section we evaluate the performance and correctness of IP masking. We 
provide a general overview on how to implement the IP masking building blocks 
on an 8-bit embedded platform, and describe how to use them to protect an 
implementation of the AES. 

5.1 Implementation of Masked Operations 

The 8-bit Atmel AVR ATMegal28 |Bj is chosen as target platform. This device 
provides an advanced RISC architecture with 133 low-level instructions and it 
offers 128 kBytes of flash program memory and 4 kBytes of internal SRAM. The 
independent side channel leakage required by our model is in our implementation 
achieved by temporal separation, i.e. instead of using two physically separated 
processors P\_ and Pr, we use a single 8-bit processor and we ensure independent 
leakage by not overlapping their respective operations. 

For the sake of optimization, we have implemented all operations in assembly 
language. The ATMegal28 does not provide an internal random number genera- 
tor to implement the rand() and randNonZero() functionalities. Therefore, and 
only for the purposes of evaluating the implementation, the required random 
bytes are provided to the microcontroller externally previous to the encryption 
process. We note that modern devices with built-in TRNG or PRNG elements 
running in parallel would allow to generate such randomness internally. 

Addition in F 2 s is carried out in a single clock cycle via the available XOR in- 
struction, whereas the rest of field operations (multiplication, inversion, raisings 
to the power of 2) are implemented via lookup tables, requiring a total of 1,536 
bytes in program memory. Besides the squaring, we have also implemented as 
lookup tables the rising to the powers of 4 and 16 required in the power function 
of the AES SubBytes step (see extended version for more details |2j). On devices 
with limited program memory these raisings can be alternatively carried out by 
consecutive squarings, effectively saving 512 bytes of program memory. 

Special care has been taken in order to make the implementation not only 
time-constant, but flow-constant i.e. conditional execution paths, which can be a 
potential source of side channel leakage, have been avoided. A typical example of 
a function with conditional execution is the multiplication in F 2 s using log/alog 
tables. This method only works when both input operands are different than zero; 
otherwise, the result of the multiplication must be equal to zero. Implementing 
this routine in constant flow requires to calculate the potential outputs of all 
conditional paths, and thus it ends up requiring 22 clock cycles. 

Worth mentioning is the implementation of the first part of Algorithm |3 for 
mask refreshing, namely sampling a vector A such that A* ^ Li for 1 < i < n. 
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This step is carried out as follows for each element A,. First, we sample two 
elements A' e F 2 s and A" e F 2 s \ {0}. If A! i ^ L t we simply set A, = A'; 
otherwise, we assign Aj = A! i ® A". Independently of the sampled values A' and 
A", this conditional statement ensures that i) the final value Aj is different than 
Lj, and ii) the final value of Aj is uniformly distributed over F 2 s. Needless to 
say, such implementation is also performed in constant flow execution to prevent 
conditional execution branches. 


5.2 Application to the AES 

We have implemented and verified the correctness of a protected instance of 
AES using the IP masking scheme with n = 2. Due to space restrictions, we 
provide a high-level description about how to apply IP masking to the AES in 
the extended version of this work |2j . As shown in Table El our implementation 
requires around 1.9 • 10 6 clock cycles to perform a protected AES encryption 
(including on-the-fly key schedule calculation). 


Table 2. Performance evaluation (in clock cycles) of AES round transformations and 
AES encryption with IP masking scheme with n = 2 


Full AES 


1,912,000 


We stress that these results should not be simply taken as an indicator to 
judge the practicality of IP masking, as they are obtained using a legacy general- 
purpose device without any type of hardware enhancements. If multiplication in 
F 2 8 was available in the instruction set of the controller our timing for AES 
encryption would be instantly reduced to less than a million cycles. This could 
be achieved e.g. by providing instruction set extensions to the target device. 

6 Conclusion 

This work narrows the gap between the fields of theoretical leakage resilient 
cryptography and practice-oriented research, and it represents a first joint step 
towards the development and evaluation of common masking schemes. Although 
the levels of security required for each model differ considerably, we expect tighter 
bounds that allow to lower the value of the security parameters as the theory of 
leakage resilient cryptography advances. At the same time, technology advances 
steadily and what was impractical yesterday will be “normal” tomorrow. As a 
consequence one might expect that schemes such as IP masking can become 
practical for higher security levels. 


AddRoundKey 

SubBytes 

(Inverse) 

SubBytes 

(Aff.Transf.) 

ShiftRows 

MixColumns 

8,796 

45,632 

72,128 

200 

27,468 
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